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Highly Configurable Neuromorphic Hardware Systems 
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Abstract In this paper we present a methodological frame- 
work that meets novel requirements emerging from upcom- 
ing types of accelerated and highly configurable neuromor- 
phic hardware systems. We describe in detail a device with 
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45 million programmable and dynamic synapses that is cur- 
rently under development, and we sketch the conceptual 
challenges that arise from taking this platform into opera- 
tion. More specifically, we aim at the establishment of this 
neuromorphic system as a flexible and neuroscientifically 
valuable modeling tool that can be used by non-hardware- 
experts. We consider various functional aspects to be cru- 
cial for this purpose, and we introduce a consistent workflow 
with detailed descriptions of all involved modules that im- 
plement the suggested steps: The integration of the hardware 
interface into the simulator-independent model description 
language PyNN; a fully automated translation between the 
PyNN domain and appropriate hardware configurations; an 
executable specification of the future neuromorphic sys- 
tem that can be seamlessly integrated into this biology-to- 
hardware mapping process as a test bench for all software 
layers and possible hardware design modifications; an eval- 
uation scheme that deploys models from a dedicated bench- 
mark library, compares the results generated by virtual or 
prototype hardware devices with reference software simula- 
tions and analyzes the differences. The integration of these 
components into one hardware- software workflow provides 
an ecosystem for ongoing preparative studies that support 
the hardware design process and represents the basis for the 
maturity of the model-to-hardware mapping software. The 
functionality and flexibility of the latter is proven with a va- 
riety of experimental results. 

Keywords Neuromorphic • VLSI • Hardware • Wafer- 
Scale • Software • Modeling • Computational Neuroscience • 
PyNN 
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1 Introduction 

Neuroscience and Technology 

Advances in neuroscience have often gone hand in hand 
with significant progress in the apphed technologies, tools 
and methods. While the experimental investigation of living 
neural tissue is indispensable for the generation of a detailed 
knowledge base of the brain, from which understanding of 
underlying principles can emerge, technological difficulties 
have always imposed limits to this endeavor. Until today it 
is not possible to study relevant observables in a sufficiently 
large fraction of brain tissue under realistic conditions and 
with a spatiotemporal resolution that is high enough to fully 
capture - and possibly consistently explain - the mecha- 
nisms of higher order brain functions. 

Therefore, in neuroscience, like in any other research 
field on dynamical systems that cannot be fully explored 
by experimental methods, models represent an indispens- 
able approach to test hypotheses and theories on the real 
subject of interest. However, even neural modeling is sig- 
nificantly constrained and influenced by the set of avail- 
able technologies. The spectrum of feasible experimental 
setups, in particular in computational neuroscience, directly 
depends on the accessible computational power. The diffi- 
culty of efficiently mapping the massive parallelism of neu- 
ral computation in biological tissue to a limited number of 
digital general purpose CPUs is a crucial bottleneck in the 
development of large-scale computational models of neural 
networks, where statistics-intensive analyses or long-term 
observations of network dynamics can become computa- 
tionally extremely expensive (see e.g. Morrison et al, 2005; 
Brette et al, 2006; Morrison et al, 2007). 

Neuromorphic Hardware 

For an alternative modeling approach, the so-called neuro- 
morphic engineering, the technology-driven nature is even 
more obvious. In a physical, typically silicon form, neuro- 
morphic devices mimic the structure and emulate the func- 
tion of biological neural networks. This branch of neuro- 
science has its origins in the 1980s (Mead and Mahowald, 
1988; Mead, 1989, 1990), and today an active community is 
working on analog or mixed-signal VLSI^ models of neural 
systems (for reviews see e.g. Renaud et al, 2007; Indiveri 
et al, 2009). 

Dedicated implementations of said computational mod- 
els are typically more power efficient compared to general 
purpose architectures and are well suited for e.g. embed- 
ded controllers of autonomous units like robots. Fault toler- 
ance features observed in biological neural architectures are 
expected to apply to corresponding neuromorphic hardware 



implementations as well. This fact can offer one important 
way to create reliable computing components on the basis 
of future nano-scale hardware constituents, where current 
design strategies will run into serious yield problems. More- 
over, the inherent parallelism of on-chip emulation of neural 
dynamics has the potential to overcome the aforementioned 
scaling limitations of pure software simulations. 

Still, until today the focus of neuromorphic projects is 
mostly very application- specific. The majority of groups 
is working on neuromorphic sensors like e.g. silicon reti- 
nas and visual processing systems (Netter and Franceschini, 
2002; Delbriick and Liu, 2004; Serrano-Gotarredona et al, 
2006; MeroUa and Boahen, 2006; Fu et al, 2008; Gomez- 
Rodriguez et al, 2010) or motor control in robotics (Lewis 
et al, 2000). The requirement of communication with the en- 
vironment is one important reason for the fact that nearly all 
neuromorphic devices reported so far are designed to oper- 
ate in real-time. But even the projects that deal with mimick- 
ing, studying or applying neural information processing (Vo- 
gelstein et al, 2007), self-organization (Hafliger, 2007; Mi- 
tra et al, 2009) or even hybrid setups coupling neuromorphic 
devices with living tissue (Bontorin et al, 2007) are usually 
focused on one type of neural architecture, one anatomical 
region or one function the implemented network is supposed 
to fulfill. 

Two main reasons for this self-limitation of neuromor- 
phic development are the finite size of every neuromorphic 
device as well as the limited possibilities to change the be- 
havior of individual cells and the network connection pat- 
terns once they have been cast into silicon. A typical ap- 
proach to reduce size limitations is to scale up networks 
by inter-connecting multiple hardware modules (Costas- 
Santos et al, 2007; Berge and Hafliger, 2007; Indiveri, 2008; 
Schemmel et al, 2008). Furthermore, recent advances in 
neuromorphic development eventually promise to overcome 
the limited flexibility of hardware models by offering a suf- 
ficiently fine-grained configurability of both the neuron pa- 
rameter values as well as the network connectivity (Indi- 
veri et al, 2006; Schemmel et al, 2007; Ehrhch et al, 2007; 
Schemmel et al, 2008; Indiveri et al, 2009; Schemmel et al, 
2010). This crucial feature allows to consider the utiliza- 
tion of neuromorphic systems as flexible modeling tools 
to approach open neuroscientific questions with new strate- 
gies (Kaplan et al, 2009; Bruderle and MuUer et al, 2009; 
Bruderle, 2009; Bruderle et al, 2010). 

A Novel Methodological Approach 

The FACETS^ research project (FACETS, 2010) and its suc- 
cessor BrainScaleS (BrainScaleS, 2010) aim at a compre- 
hensive exploitation of the possibilities inherent to that ap- 
proach. The highly interdisciplinary collaborations gather 
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neurophysiological, theoretical and hardware expertise in 
order to develop and operate a large-scale neuromorphic de- 
vice that can serve as a flexible neural network emulation 
platform with hitherto unattained configurability and accel- 
eration. It is planned to exploit this combination of fea- 
tures with experimental paradigms that are not realizable 
with pure software simulations, like long-term learning stud- 
ies, systematic parameter explorations and the acquisition of 
statistics for every tested setup. 

Following this attempt, one important insight has 
emerged that has only rarely been addressed in the liter- 
ature so far (exceptions are e.g. Dante et al, 2005; Oster 
et al, 2005): Any hardware device that is complex enough to 
serve as a useful neural modeling tool is useless without an 
appropriate software environment that implements a reason- 
able methodological framework for its operation. For any 
developed neuromorphic modeling platform, hard- and soft- 
ware have to form a functional unit. Moreover, the need for 
methods that have to be applied in order to make the advan- 
tages of a neuromorphic device accessible to non-hardware 
experts does not only refer to the actual operation of the 
device itself. Instead, already its design process needs to be 
supported and influenced by preparatory studies, e.g. with 
virtual versions of the future hardware- software system. 

In this publication we summarize the FACETS efforts 
to create a comprehensive methodological framework pro- 
viding a workflow aiming to make the innovative FACETS 
wafer- scale hardware system a generic modeling tool that is 
applicable to a wide range of neuroscientific questions and 
accessible to the neuroscientific community. 

Structure of this Article 

This introduction is followed by a description of the com- 
plete neuromorphic modeling infrastructure. This includes 
both the utilized hardware devices and the workflow that is 
in focus of this paper, i.e. the framework of methods and 
software modules that have been developed for the design- 
assistance, the benchmarking and the actual operation of the 
platform. A third section presents data and results that pro- 
vide a proof of functionality for the concept as a whole. 
Various components of the workflow are evaluated, and the 
performance of benchmark model experiments performed 
with the complete system is studied and analyzed. The last 
section discusses the state of validation of the presented 
framework as well as its advantages and limitations consid- 
ering alternative approaches. Implications and plans for fu- 
ture work and new perspectives arising from the presented 
achievements are outlined. 



2 The Workflow Components: Modules and Methods 

The following section provides an overview over the com- 
plete infrastructure that has been developed to realize a 
novel neural modeling concept built around the FACETS 
neuromorphic hardware system. For this purpose, the neu- 
romorphic device itself is presented in Section 2.1 on a 
level of detail that is appropriate to the method descriptions 
that follow. These methods are either implemented by or di- 
rectly rely on an innovative software framework, which will 
be explained in Section 2.2 by means of its structure and 
concepts. A significant achievement for the targeted design 
and development of a harmonizing hardware- software unit 
forming the modeling platform was the collection of a set of 
literature-based benchmark model experiments, as summa- 
rized in Section 2.4. 

The workflow that has been developed around these 
three main components is schematically depicted in Fig- 
ure 1: The library of dedicated neuroscientific benchmark 
models, including descriptions and measures to evaluate 
their correct functionality, has been established by members 
of the FACETS research project (FACETS, 2010). For any 
model from within this set, a description on the basis of the 
simulator-independent modeling language PyNN (see Sec- 
tion 2.2.1) is available. The mentioned translation software 
stack performs an automated conversion of these scripts into 
appropriate data for the configuration and control of differ- 
ent hardware or software back-ends. The same stack also 
re-translates the resulting hardware output into the domain 
of its biological interpretation. During the development and 
optimization phase of the FACETS wafer- scale hardware 
system, an elaborate virtual version of the device (see Sec- 
tion 2.3) serves as a test bench for the development and tun- 
ing of all involved translation software modules. 

In addition to the virtual wafer-scale device, a purely 
chip-based neuromorphic system (see Section 2.1.5) pro- 
vides important information about characteristics of cir- 
cuits planned to be implemented in the wafer-scale system. 
These ASICs thereby support the wafer-scale design pro- 
cess and the development of possible strategies to compen- 
sate unavoidable phenomena like transistor-level variations 
or noise. The outputs of all applied hardware or virtual hard- 
ware back-ends are compared with the target output descrip- 
tions included with the models in the benchmark library and 
with reference experiments on pure software simulators. The 
remaining differences are analyzed, as is exemplarily pre- 
sented in Section 3.1. 

In an ongoing optimization flow, the benchmark mod- 
els are repeatedly mapped to the still evolving hardware 
substrate with the likewise continuously optimized software 
framework. The iteratively applied result analyses provide 
the fundament for improvements that close the workflow 
loop: The hardware design, the biology-to-hardware trans- 
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Fig. 1: Optimization workflow towards neural modeling with neuromorphic hardware devices. The main components are 
1. the highly configurable FACETS neuromorphic hardware devices, 2. the software module stack that performs an auto- 
mated translation of neural network experiments described with the modeling language PyNN into corresponding hardware 
configuration and control patterns, and 3. a benchmark library that contains a collection of neuroscientific models written in 
PyNN. For a detailed explanation of the complete flow and all individual steps and components see full text. 



lation modules and optionally even the models themselves 
are modified such that the functional effects of remaining 
distortions caused by the model-to-hardware mapping pro- 
cess are minimized. 

Hence, the first application of the presented workflow is 
to take novel types of hardware devices into operation. Fur- 
thermore, it can serve as a basic methodological paradigm 
for the actual target application of neuromorphic systems, 
i.e. the exploration and optimization of neural architectures 
by means of different optimization objectives. These include 
the search for computationally powerful structures or for se- 
tups that can reproduce biologically plausible dynamics. 

While this section motivates and explains the workflow 
as a whole and provides descriptions of all involved com- 



ponents and methods, the scope of this paper would be ex- 
ceeded by providing detailed motivation for all particular 
choices of methods and components being part of the frame- 
work. The reasons for individual methodological or design 
decisions can be found in the literature referenced within the 
corresponding paragraphs. 

2.1 The FACETS Hardware System 

In the following, the FACETS wafer-scale hardware system 
will be described with focus on conceptual and technical de- 
tails that are relevant in the context of this article. More in- 
formation on the hardware setup and circuitry can be found 
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in Schemmel et al (2008), Ehrlich et al (2007), Millner et al 
(2010) and Schemmel et al (2010). 

At the core of the FACETS wafer-scale hardware sys- 
tem (see Figure 2) is an uncut wafer built from mixed- 
signal ASICs^ , named High Input Count Analog Neural Net- 
work chips (HICANNs, Schemmel et al, 2008), that pro- 
vide a highly configurable substrate which physically emu- 
lates adaptively spiking neurons and dynamic synapses. The 
intrinsic time constants of these VLSI model circuits are 
multiple orders of magnitude shorter than their biological 
originals. Consequently, the hardware model evolves with a 
speedup factor of 10^ up to 10^ compared to biological real 
time, the precise value depending on the configuration of the 
system. 

In addition to a high-bandwidth asynchronous on- wafer 
event communication infrastructure, full custom digital off- 
wafer ASICs provide terminals for a packet-based multi- 
purpose communication network (Scholze et al, 2010). 
These so called Digital Network Chips (DNCs) are backed 
by a flexible FPGA"^ design that handles the packet routing 
(Hartmann et al, 2010). The communication infrastructure 
is illustrated in Figure 5. See Section 2.1.3 for details on the 
inter-chip communication scheme. 

A full wafer system will comprise 384 interconnectable 
HICANNs, each of which implements more than 100,000 
programmable dynamic synapses and up to 512 neurons, re- 
sulting in a total of approximately 45 million synapses and 
up to 200,000 neurons per wafer. The exact number of neu- 
rons depends on the configuration of the substrate, which al- 
lows to combine multiple neuron building blocks to increase 
the input count per cell. 



chanical components are custom-made by FACETS project 
members. 




Fig. 2: The FACETS wafer-scale hardware system: Wafer 
(A) comprising HICANN building blocks and on-wafer 
communication infrastructure, wafer bracket (B), top frame 
(C) and digital inter-wafer and wafer-host communication 
modules (D). 



2.7.2 The HICANN Building Block 

The HICANN building block shown in Figure 3 is the neu- 
romorphic ASIC of the FACETS wafer-scale hardware sys- 
tem. The inter-chip communication scheme is explained in 
Section 2.1.3. 



2.7.7 Composition of the FACETS Hardware System 

The wafer as the main component for the FACETS wafer- 
scale hardware system has to be embedded into a frame- 
work that provides the electrical integration as well as the 
mechanical stability. The wafer has a diameter of 20 cm and 
will be placed into an aluminum plate which also serves as 
a heat sink. A multi-layer Printed Circuit Board (PCB) is 
placed on top of the wafer. This PCB has to provide the 
fan-out of 1500 impedance-controlled differential pairs and 
- in the worst case - has to deliver a total electrical power 
of 1000 Watts to the wafer. A 14-layer fine pitch board with 
laser drilled micro-vias and a total size of 430 mm x 430 mm 
meets these requirements. The PCB will be clamped to an 
aluminum frame that is also used as a platform for com- 
munication devices such as the 48 DNCs and the 12 FPGA 
boards (see Section 2.1.3). Figure 2 shows a 3-D drawing of 
the hardware composition. All depicted electrical and me- 

^ Application Specific Integrated Circuit 
^ Field Programmable Gate Array 




Fig. 3: A photograph of the HICANN building block with 
synapse arrays (A), neurons (B), floating-gate arrays (C) and 
LI routing (D). 



Simplifying, the HICANN can be divided into four 
parts: the neuron circuits with their analog parameter stor- 
age based on floating gate technology (Lande et al, 1996), 
an array of 114688 dynamic synapses and the Layer I (LI) 
bus system interconnecting HICANNs on a wafer. The hard- 
ware neurons implemented by the HICANN building blocks 
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(Millner et al, 2010) can emulate the adaptive exponential 
integrate-and-fire neuron model (AdEx, Brette and Gerstner, 
2005) which can produce complex firing patterns observed 
in biology (see e.g. Markram et al, 2004; Destexhe, 2009), 
like spike-frequency-adaptation, bursting, regular spiking, 
irregular spiking and transient spiking, by tuning a limited 
number of parameters (Naud et al, 2008). The decision to 
implement this particular neuron model in hardware was 
motivated by the large spectrum of possible and biologically 
relevant cell behavior realizable with a comparably compact 
circuitry. The latter fact is a crucial aspect when aiming at 
the integration of large numbers of neurons in one hardware 
system. For a neuromorphic implementation of Hodgkin- 
Huxley cells that consume significantly more chip area see 
e.g. DaouzH et al (2008). 

The model can be described by the following two differ- 
ential equations for the membrane voltage V and the adapta- 
tion variable w and a reset condition specified further below: 

dV (v-vt\ 

-Cm-7- = gi(v-Ei)-giAey 

at 

^g,{t){V-E,) 

+ ^i(0(V-^i) , (1) 
-T^^ =w-a{V-Ei) . (2) 

Cm, gh ge and are the membrane capacitance, the leak- 
age conductance and the conductances for excitatory and 
inhibitory synaptic inputs, where and gi depend on time 
and on the inputs from other neurons. E\, Ei and Eq are the 
leakage reversal potential and the synaptic reversal poten- 
tials. The parameters Vt and are the effective threshold 
potential and the threshold slope factor. The time constant 
of the adaptation variable is Tw The adaptation parameter a 
has the dimension of a conductance. 

If the membrane voltage crosses a certain threshold volt- 
age ©, the neuron is reset: 

V^Vreset , (3) 

w^w-\-b . (4) 

The parameter b is responsible for spike-triggered adapta- 
tion. 

A neuron can be constructed out of up to 64 so-called 
denmem circuits, each implementing the dynamics of the 
AdEx model and being connected to up to 224 synapses. 
This way a neuron could have synaptic inputs from up to 
14,336 other cells. Additionally, depressing and facilitating 
mechanisms of short-term synaptic dynamics (for a review 
see Zucker and Regehr, 2002) are implemented. A purely 
chip-based FACETS hardware implementation of this fea- 
ture is described and applied in Bill et al (2010). 

A general limitation of neuromorphic implementations 
of cell models is the fact that configurable parameter val- 
ues will always have limited ranges. The value ranges of all 



AdEx parameters configurable in the hardware implementa- 
tion have been designed such that the complete set of bio- 
logically relevant firing patterns distinguished e.g. in Naud 
et al (2008) can be reproduced. If this design goal was fully 
achieved is currently studied with HICANN prototypes, i.e. 
work in progress (see also Section 3.4). 

Hebbian Learning in the FACETS Hardware Long-term 
Hebbian learning in the FACETS hardware devices is imple- 
mented in every synapse as spike-timing-dependent plastic- 
ity (STDP, reviewed e.g. in Morrison et al, 2008). To ensure 
high flexibility in terms of mappable neuronal networks each 
neuron in hardware needs an appropriate number of synaptic 
inputs. However, due to limited die area, a trade-off between 
the number of synapses and the chip resources for a single 
synapse has to be made. 

To achieve a minimal circuit size for the synapses, lo- 
cal correlation measurements and the local synaptic weight 
storage are separated from global weight update controllers 
(Schemmel et al, 2007, 2006). Causal and acausal correla- 
tions between pre- and post-synaptic spikes determine the 
temporal factor of the STDP rule described in Schemmel 
et al (2004) and are accumulated locally until they are pro- 
cessed by the update controller. Synaptic weights are stored 
locally as digital values with a four-bit resolution each. This 
resolution is again a trade-off between precision and chip 
resources and requires several correlated events to reach the 
next discrete weight value. If a sufficient amount of corre- 
lations is accumulated, the discrete weight is updated by 
the update controller. Since many synapses share one update 
controller a weight update is performed periodically with a 
frequency that has an upper limit determined by the circuitry 
(Schemmel et al, 2006). Since a reduced symmetric nearest- 
neighbor spike pairing scheme turned out to be one feasible 
approach for describing biological measurements (Burkitt 
et al, 2007; Morrison et al, 2008), this specific plasticity 
mechanism has been chosen to be implemented in hardware 
to further reduce the size of a synapse. Update controllers 
are modifying the synaptic weights by using look-up tables 
that are listing, for each discrete weight value, the resulting 
weight values in case of measured causal or acausal corre- 
lations. These look-up tables can be adapted to the weight- 
dependent factor of any STDP rule. 

Despite its global weight update controllers, the STDP 
mechanism of the FACETS hardware has to be considered 
local to every synapse. The implementation of this partic- 
ular model represents a project- wide decision on the most 
promising mechanism to be cast into silicon, taken in the 
early phase of FACETS. Recent developments in the model- 
ing of learning and self-organization in neural networks (see 
e.g. Sjostrom et al, 2008; Pfeiffer et al, 2010) combine such 
local rules with various global mechanisms like the reward- 
based modulation of large groups of synapses. With respect 
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to more complex and relevant plasticity mechanisms, an ex- 
tension to STDP rules with additional input parameters, e.g. 
membrane potentials, spike rates or global reward signals, is 
currently under development. 

Parameter Memories In contrast to most other systems, the 
FACETS wafer- scale hardware deploys analog floating gate 
memories similar to cells developed by Lande et al (1996) as 
storage devices for the analog parameters. Due to the small 
size of these cells, most parameters can be provided individ- 
ually for a single neuron circuit. This way, matching issues 
can be counterbalanced, and different types of neurons can 
be implemented on a single chip. 

As a starting point for the parameter ranges, parameters 
from Brette and Gerstner (2005) and Destexhe et al (1998) 
have been used. The chosen ranges allow leakage time con- 
stants Tmem = Cm/gi at an acceleration factor of 10^ between 
1 ms and 588 ms and an adaptation time constant Tw between 
10 ms and 5 s in terms of biological real time. The parame- 
ters used by Pospischil et al (2008), for example, lie easily 
within this range. 

A substantial amount of digital memories is integrated 
in the chip, dominated by the synapse RAM. Each of the 
1 14, 688 synapses has 8 bit memory cells for weight and ad- 
dress storage. For the whole wafer, the synapse RAM alone 
is 38 MB large. Figure 4 shows the partitioning of the pa- 
rameter memory on a HICANN building block. To compare 
the analog floating gates to normal digital memory, each cell 
has been counted as 10 bit, since this is the number of bits 
needed to program it. 




Fig. 4: Sector diagram of the parameter space to configure 
one HICANN chip. For a full wafer, the configuration data 
volume is 44 MB large. 



2.13 Communication Infrastructure 

The communication infrastructure of the FACETS wafer- 
scale hardware is illustrated in Figure 5. Pulse communi- 
cation is generally based on the digital transmission of neu- 
ral events representing action potentials, but a distinction in 
two network layers can be made. An asynchronous, serial 
protocol, named Layer 1 (LI) utilized by HICANNs at a 



wafer level provides intra-wafer action potential transmis- 
sion on a high density direct interconnection grid. A second 
one, named Layer 2 (L2), deploys the DNCs and FPGAs for 
synchronous, packet-based, intra and inter- wafer communi- 
cation and - compared to LI - establishes a more flexible 
routed network of lower density. To cope with inevitable 
jitter in routing delay, a time stamp is transmitted together 
with the address within the data packets of this network. A 
PC cluster that handles the mapping, configuration and con- 
trol process described in Section 2.2 as well as the playback 
and recording of external stimuli to the neural network is 
connected to the FPGAs via multi-Gigabit Ethernet. 

Activity is injected into the LI network in the form 
of 6 bit serial pulse address packets by neurons that con- 
nect to the horizontal buses. Sparsely populated passive 
switch matrices at the intersections of horizontal and ver- 
tical buses pass the data to the vertical buses. Further sparse 
switch matrices connect to horizontal lines feeding synapse 
drivers that act as data sinks to the network. While crossing 
HICANN block boundaries the signals are refreshed by re- 
peater circuits with active re-timing that are capable of driv- 
ing the signals across one HICANN block. The sparseness 
of the switch matrices is chosen such that the repeater cir- 
cuits are not overloaded while still providing maximum flex- 
ibility for implementing various neural network topologies 
(see Fieres et al, 2008 and Schemmel et al, 2010 for more 
information on the underlying design decisions and analyses 
of the resulting limitations). 

Connectivity between the HICANN blocks is estab- 
lished by edge connecting them in the layout. As illustrated 
in Figure 5, this is only possible for eight HICANNs located 
within one reticle. A reticle is the largest producible unit on 
the wafer and no connections can be formed between reticles 
during standard CMOS fabrication. Wafer-scale connectiv- 
ity is obtained using a post-processing method developed in 
the FACETS project. It offers two additional routing layers 
that can cover the whole wafer. By means of this technique, 
an inter-reticle connection pitch well below 10 |j,m can be 
achieved which facilitates the required connectivity. Further- 
more, large landing pads are formed by the post-processing 
that connect the wafer to the system PCB via elastomeric 
stripe connectors (see Figure 2 and Schemmel et al, 2010). 

These stripe connectors are used to deliver all required 
power to the wafer. Additionally, they connect high speed 
communication signals between the HICANNs and the 
DNCs^ . This high speed communication interface transports 
configuration data as well as the above-mentioned L2 data 
packets. L2/L1 protocol conversion is performed inside the 
HICANN blocks, where L2 activity can either be injected 
to or read from the LI network (see Figure 5). The trans- 

^ For completeness it should be noted that also analog signals, 
e.g. selectable neuron membrane voltages, are transported through the 
stripe connectors. 
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Fig. 5: Communication structure on a wafer module of the FACETS wafer-scale hardware system. Neural activity is trans- 
ported horizontally (dark gray) and vertically (light gray) via asynchronous Ll buses on the HICANN building blocks. 
Repeater circuits at the edges of these blocks allow for a distribution of the buses over the whole wafer. Off- wafer connec- 
tivity is established by the L2 network via DNCs and FPGAs. It interfaces the Ll buses on the HICANN building blocks. 
Several wafer modules can be interconnected using routing functionality between the FPGAs via Ethernet switches. 



port of the L2 packets is handled by the DNCs, which also 
implement a time- stamp based buffering and heap- sort al- 
gorithm (Scholze et al, 2010). Together with routing logic 
inside the FPGAs, the DNC-FPGA L2 network fulfills the 
QoS^ demands (Philipp et al, 2009) for spiking neural net- 
works, i.e. a constant delay at a low pulse loss rate. This is 
also true for inter-wafer connections routed through Ether- 
net switches connected to the FPGAs. 

2.7.4 Host Interface 

The packet communication between wafer and host com- 
puter passes through several layers: DNCs, FPGA controller 
boards and a Gigabit Ethernet layer (Norris, 2003) have to 
be traversed. As each of the twelve FPGA controller boards 
(see C in Figure 2) comprises two Gigabit ports dedicated 
for host communication, a total bandwidth of 24 GBii/s can be 
achieved. Standard networking switches concentrate these 
links into the required number of 10GBase-LX4 (Horak, 
2007) upstream ports. A standard PC cluster equipped with 
adequate network interface cards handles the traffic. A cus- 
tom design ARQ^- style (Fairhurst, 2002) protocol provides 
a reliable communication channel between the host com- 
puter and the hardware system. The FPGA controller boards 
act as remote terminals for these ARQ communication chan- 
nels, but also provide system control functionality. 

During experiments, most communication data - basi- 
cally spike events - flow directly between host computer 
and FPGA controller boards. In contrast to this, in the initial 
setup stage almost all traffic - i.e. the system configuration 
data - is dedicated to wafer communication. In this case, the 



FPGA controllers act as simple transmission nodes between 
host computer and wafer. Both operational stages impose 
high demands on the communication bandwidth. The ini- 
tial configuration space consumes around 50 MB (see Fig- 
ure 4). Every spike event is represented by a 15 -bit time 
stamp and a 12-bit data field, comprising both DNC and 
HICANN identifiers. Thus, during an experiment approx- 
imately 1 GEvent/s can be transported to and from the host 
computer. At a speedup factor of 10^, the corresponding to- 
tal spike rate in the biological time domain is 100 kHz per 
wafer. 

To meet these requirements set by the hardware scale, 
acceleration factor and modeling constraints, a highly scal- 
able software implementation of the communication proto- 
col was developed (see Section 2.2.10 and Schilling, 2010). 
This multi-threaded protocol stack already provides a zero- 
copy API^ to the upper software layers. 

Furthermore, to support future applications, such as in- 
terfacing the FACETS hardware system to simulated envi- 
ronments which provide sensor output related to motor in- 
put, low round-trip times between these components are cru- 
cial. Such classes of in-the-loop experiments demand low la- 
tency communication and high bandwidth at the same time. 

2.7.5 Chip-Based Neuromorphic System 

On the development path towards the FACETS wafer- scale 
hardware platform, a purely chip-based neuromorphic sys- 
tem has been designed and built (Schemmel et al, 2006, 
2007) and is in active use (Kaplan et al, 2009; Briiderle and 
MuUer et al, 2009; Bruderle et al, 2010; BiU et al, 2010). 



^ Quality of Service 
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It implements time-continuous leaky integrate- and- fire cells 
with conductance-based synapses and both a short-term and 
a long-term plasticity mechanism as described above for 
the wafer-scale device. Up to 16 of these ASICs, each of 
which provides 384 neurons and 10^ configurable and plas- 
tic synaptic connections, can be operated individually or in- 
terconnected via a supporting backplane board. This board 
is connected via a single Gigabit Ethernet link to a host 
computer, through which multiple users can access and use 
the neuromorphic devices in parallel. The possibility of re- 
motely accessing the chips via the Internet in addition to 
setting up and running experiments with an available PyNN 
interface (see Section 2.2.1 and Briiderle and Miiller et al, 
2009) already now make this system a tool that is used 
for neuromorphic model exploration by users from various 
countries. Many circuit design strategies for the wafer- scale 
system are implemented for testing purposes in this chip- 
based device, including the STDP correlation measurements 
(see Section 2.1.2) located in every individual synapse. Ba- 
sic plasticity studies supporting the design of the wafer- scale 
system, some of which are outlined in Section 2.5.3, in- 
corporate investigations on the basis of experimental results 
from the chip-based devices. 

2.2 Software Framework 

Figure 6 shows the stack of software modules that will be 
described in the following. 



I i 




Fig. 6: Schematic of the hardware abstraction layer, i.e. the 
stack of software modules for the automated and bidirec- 
tional translation between PyNN model descriptions and ap- 
propriate hardware configuration and control patterns. The 
individual modules are: A Python control layer, a mapping 
layer that operates on a graph-based data container (Graph- 
Model), and low-level layers that deliver the generated hard- 
ware configuration patterns and control sequences via a ded- 
icated communication protocol. 



Its components seamlessly interact in performing an 
automated translation of arbitrary neural network experi- 
ment descriptions into appropriate data for hardware con- 
figuration and control. The same stack also automatically 
re-interprets the acquired hardware output into its biolog- 
ical interpretation. The top-level interface offered to hard- 
ware users to describe neural network setups is based on the 
simulator-agnostic modeling language PyNN. The concept 
of this approach and its advantages, especially for neuromor- 
phic system operation, will be described in Section 2.2.1. 

The process of mapping a PyNN description onto the 
configuration space of the FACETS hardware systems, in- 
cluding dedicated representation formats, will be described 
in Sections 2.2.2 to 2.2.8. Sections 2.2.9 and 2.3 focus on 
the mapping analysis plus its testing and optimization on the 
basis of an elaborate virtual version of the wafer-scale hard- 
ware system. The special performance requirements for the 
low-level host-to-hardware communication software and the 
implemented corresponding solutions are outlined in Sec- 
tion 2.2.10. 

2.2.1 PyNN & NeuroTools 

PyNN is a simulator-independent. Python-based language 
designed for describing spiking neural network models 
(Davison et al, 2008). It offers functions and classes for the 
setup and control of experiments, and it provides standard 
cell models as well as standardized dimension units. PyNN 
supports various software simulators like NEURON (Hines 
and Carnevale, 2006; Hines et al, 2009), NEST (Gewaltig 
and Diesmann, 2007; Eppler et al, 2008), Brian (Goodman 
and Brette, 2008) and PCSIM (Pecevski et al, 2009). With 
PyNN, which is open source and well documented, a user 
can set up a neural network model, run it on any of the sup- 
ported back-ends without changing the code, and directly 
compare the results. This provides the possibility to con- 
veniently port experiments between different simulators, to 
transparently share models and results, and to verify data 
acquired from different back-ends (see Figure 7). 

The integration of the operating software framework 
for the FACETS hardware system into the PyNN concept 
(Briiderle and MuUer et al, 2009; Briiderle, 2009) is a cru- 
cial aspect of the presented neuromorphic workflow. One 
important motivation for this approach is to create a bridge 
between the communities of neuromorphic engineers and 
neural modelers, who have been working in rather separate 
projects so far. The back-end agnostic concept of PyNN, 
now also offering the possibility to port existing experiments 
between the supported software simulators and the FACETS 
hardware system, allows to benchmark and verify the hard- 
ware model. The API of PyNN is easy to learn, especially 
for scientists who have already worked with software sim- 
ulators. Hence, PyNN represents an optimal way to provide 
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Fig. 7: Schematic of the simulator-independent modeling language PyNN. Like various established software simulators, the 
FACETS neuromorphic hardware systems have been integrated into the PyNN unification and standardization concept. 



non-hardware experts a convenient interface to work with 
the FACETS neuromorphic devices. In general, PyNN in- 
terfaces to neuromorphic systems make it possible to for- 
mulate transparent tests, benchmarks and feature requests, 
and therefore can influence and boost biologically oriented 
hardware development. They might, eventually, support the 
establishment of such emulation devices as useful modeling 
tools. 

On top of PyNN, a library of analysis tools called Neuro- 
Tools (NeuroTools, 2008) is being developed, which builds 
upon the interface and data format standards, but also ex- 
ploits the possibility to incorporate third-party Python mod- 
ules e.g. for scientific computing and plotting (Oliphant, 
2007; Jones et al, 2001; Langtangen, 2008; Hunter, 2007). 
Thus, for all supported software simulators and for the 
FACETS neuromorphic hardware systems, all stages of neu- 
ral modeling experiments - description, execution, result 
storage, analysis and plotting - can be performed from 
within the PyNN and NeuroTools framework. 

Simulations as Reference for Translation and Calibration 
The hardware- specific PyNN approach incorporates quanti- 
tative bidirectional translation methods between the neuro- 
morphic system dynamics and the biological domain, both 
in terms of electrical variables and the different time do- 
mains. This translation incorporates calibration routines that 
minimize the impact of transistor-level fixed-pattern noise 
on the behavior of neural and synaptic circuits. The trans- 
lation and calibration scheme developed for the FACETS 
hardware systems directly involves reference software sim- 
ulations for the biologically relevant gauging of hardware 
parameters, heavily exploiting the PyNN paradigm of uni- 
fied setup descriptions. Section 2.2.7 provides more details 
on this. 



2.2.2 Mapping Process 

The mapping process determines a valid routing network 
configuration and parameter value set as initial setup data 
for the FACETS hardware system. This takes into account 
topology constraints between hardware blocks such as con- 
nectivity, connection counts, priorities and distances as well 
as source/target counts. Figure 8 depicts the single steps of 
the mapping process as described by Ehrlich et al (2010). 

The mapping is accomplished in the three main steps of 
Placement, Routing and Parameter Transformation & Cal- 
ibration, with an appropriate Pre- and Postprocessing of 
the configuration data. As the first three main steps are ex- 
plained in more detail in the following we will shortly sum- 
marize the functionality of the remaining parts. 

Starting with a neural architecture defined via PyNN, the 
first mapping step of Preprocessing reads in a description of 
the hardware (see Section 2.2.3), described using the novel 
query language GMPath (see Section 2.2.4). It sets up an 
internal representation for both the hardware and the bio- 
logical model in the form of a directed graph called Graph 
Model (see Section 2.2.4). Optionally, a so-called PreMap- 
ping netlist of the biological model can be streamed out into 
a file. Following placement and routing, the same applies 
for the Postprocessing with a PostMapping netlist, which 
includes the possibility to obtain a PyNN script that repre- 
sents the (possibly distorted) network ultimately realized on 
the hardware back-end. 

The individual steps of the process are automatically ini- 
tiated and partly controllable via the PyNN module for the 
FACETS hardware system. Furthermore a stand-alone soft- 
ware named GraViTo is provided for the analysis of the map- 
ping results (see Section 2.2.9). 
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Fig. 8: Mapping PyNN neural network model descriptions 
onto the configuration space of the wafer- scale hardware 
system. The three main processing steps, all operating on 
one unified data container (GraphModel), are (a) the plac- 
ing of neurons onto the available circuitry, (b) the realiza- 
tion of synaptic connections by appropriately configuring 
the available routing infrastructure on the device, and (c) the 
transformation of neuron and synapse parameters into cor- 
responding parameter values offered by the highly config- 
urable device. The latter step can involve calibration data to 
tune individual sub-circuits such that the effect of unavoid- 
able transistor-level variations onto the mapped models is 
minimized. 

2.2.3 Internal Hardware Description 

Prior to the mapping process we have to define the hard- 
ware in an abstract manner. For this purpose we utilize the 
path language GMPath to set up an appropriate GraphModel 
(both described in Section 2.2.4) as a versatile internal rep- 
resentation. 

In Figure 9 a FACETS wafer- scale hardware setup - also 
applied in Ehrlich et al (2010) - is illustrated. As described 
in Section 2.1, the fundamental layer of the FACETS wafer- 
scale hardware is an array of reticles shown as light gray 
squares, housing the HICANN circuitry that implements 
neural functionality, with a second layer of DNCs above. 
The third and topmost layer represents a regular grid of 
FPGAs, colored dark gray. 
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Fig. 9: Example FACETS wafer-scale hardware setup from 
Ehrlich et al (2010): 12 FPGAs control 48 DNCs, which are 
connected to 384 HICANN ASICs. 

2.2.4 The GraphModel Container 

A data model called GraphModel (Wendt et al, 2008) repre- 
sents both the targeted biological and the configurable hard- 
ware structure within the mapping software. It can be char- 
acterized as a hierarchical hyper graph and consists of ver- 
tices (data objects) and edges (relationships between the ver- 
tices). A vertex contains a single data value. An edge can be 
one of the following types: 

hierarchical: models a parent-child relationship, structuring 
the model 

named: forms a directed and named relation between 
any two vertices in the model 

hyper: assigns a vertex to a named edge, characteriz- 

ing the edge in more detail 

The major advantage of this graph approach are the imple- 
mentation convenience and efficiency as well as the flexibil- 
ity to achieve the complex requirements from both the bio- 
logical and the hardware model. Due to the structure of the 
graph model it can be easily (de-)serialized, providing save 
and restore functionality. Via the path-based query-language 
GMPath (Wendt et al, 2010) information can be dynamically 
retrieved from and stored to the models. The GraphModel is 
used to store all information during the configuration pro- 
cess, i.e. the models themselves, the mapping, routing and 
parameter transformation algorithms data and their results. 

Figure 10 shows the graph model representation of a bi- 
ological network (called BioGraph) and its hardware repre- 
sentation (called Hardware Graph), connecting elements via 
named edges after a placement step. 

The Query Language GMPath To retrieve information from 
and propagate data to the graph models, the path-based 
query language GMPath was developed, providing a uni- 
versal interface for placing and routing algorithms as well 
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Fig. 1 1 : An example GMPath request to retrieve all assigned 
Fig. 10: A simplified example of two graph models, assign- neurons of the biological model, 
ing neural elements to hardware components. 



as for configuration, visualization and analysis tools (Wendt 
et al, 2010). Based on so-called navigational steps, a path 
request can enter the model at any point (node or edge) and 
addresses iteratively the logical environment by 

• shifting the focus hierarchically up- or downward 

• shifting the focus back and forth along edges 

• filtering according names 

• concatenating sub-queries 

The results are lists of nodes or edges and serve the re- 
questing software as model information input. Because of its 
string based format and the ability to address nodes or edges 
unambiguously, the queries can be created conveniently and 
dynamically at runtime and can be used to extend and mod- 
ify the models. 

Figure 11 exemplarily shows subsequent navigational 
steps of an executed path request, which enters the abstract 
hardware model at its root, addresses all existing HICANN 
nodes and finally follows incoming mapping edges to their 
origins, the neurons of the biological model. 

2.2.5 Neuron Placement 

The process of assigning neural elements like neurons, 
synapses or their parameters to distinct hardware elements is 
called placement. It can be characterized as a multi-objective 
optimization problem, the solution of which significantly 
influences the overall mapping results. Typical algorithmic 
approaches create clusters of cells with common properties 
that are mapped to the same HICANN building blocks. Pos- 
sible optimization objectives are: 

• minimize neural input/output variability cluster- wise 



• minimize neural connection count cluster- wise 

• comply with parameter limitations 

• comply with cluster capacities (neural capacity of hard- 
ware elements) 

• minimize routing distances 

In order to achieve these objectives with user-defined 
weightings in acceptable computation time, a force-based 
optimization heuristic was developed. This algorithm bal- 
ances modtltd forces (special implementations of the opti- 
mization objectives) in an ^-dimensional space until an equi- 
librium is reached and a final separation step assigns data 
objects to clusters with affine properties. Despite this prob- 
lem being NP-complete, significantly improved results can 
be found with this algorithm in an acceptable computation 
time, as compared to a fast random placement. 

Figure 12 illustrates a placement process, divided into 
an optimization step, which sorts the given biological net- 
work for optimal hardware utilization with regard to the in- 
put source variability, and an assignment step, defining the 
physical realization of neural elements on the hardware sys- 
tem. 

2.2.6 Connection Routing 

The routing step allocates and configures the hardware re- 
sources for establishing the synaptic connections in the al- 
ready placed BioGraph. Given the fixed amount of available 
resources it is not evident a priori whether arbitrary network 
topologies are always perfectly reproducible. 

Synaptic connections can in principle be established via 
the LI and L2 infrastructure (see Section 2.1.3). In the ap- 
proach described here, all intra- wafer connectivity is routed 
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Fig. 12: An example placement, divided into an optimization 
and assignment step. 

exclusively on LI. The L2 network is reserved for inter- 
wafer connections in a multi- wafer system. 

The intra-wafer routing algorithms were developed in 
close cooperation with the wafer design (Fieres et al, 2008). 
Some hard- wired features of the LI infrastructure are thus 
laid out to optimally suit the routing requirements. The rout- 
ing itself is performed in two stages. The first stage estab- 
Hshes connections on a HICANN-to-HICANN level via the 
horizontal and vertical LI buses, mainly by configuring the 
bus repeaters and sparse crossbars (see Figure 5). In the sec- 
ond stage the signals are routed from the vertical LI bus 
lanes into the synapse arrays via the sparse switch matri- 
ces, the synapse drivers and the address decoders of the 
synapses, the latter not being shown in Figure 5. 

The algorithms were proven in various test scenarios: 
Homogeneous randomly connected networks with up to 
16,000 neurons, locally connected networks (according to 
Tao et al, 2004) as well as a model of a cortical column 
(following Binzegger et al, 2004 and Kremkow et al, 2007) 
with 10,000 neurons. It turns out that in typical cases only 
a small amount of unrouted connections must be accepted, 
mainly due to limited resources in the second routing stage. 
However, it was also shown that by decreasing the density of 
the neuron placing the routing can be generally facilitated, 
at the expense of a larger portion of idle hardware synapses. 

The routing algorithms proved to be also applicable for 
the benchmarks described later in this publication, see Sec- 
tion 2.4 and Ehrfich et al (2010). 

2.2.7 Parameter Transformation 



space. It is performed HICANN-wise. Biological data is 
first acquired from the so-called BioGraph of the Graph- 
Model (see Section 2.2.4) and then transformed into a hard- 
ware configuration, which is stored back into the Hard- 
wareGraph. For an adequate biology-to-hardware transla- 
tion several constraints have to be considered, such as hard- 
ware imperfections and shared or correlated parameters in 
the microchip. 

For the membrane circuits, a two-step procedure was 
developed to translate the 18 biological parameters from 
the PyNN description to the 24 electrical parameters of the 
HICANN building block. The first step is to scale the biolog- 
ical neuron model parameters in terms of time and voltage. 
At this stage, the desired acceleration factor is chosen and 
applied to the two time constants of the neuron model. Then, 
the biological voltage parameters are transformed to match 
the voltage levels of the HICANN building block. The sec- 
ond step is to translate those intermediate values to appropri- 
ate hardware parameters. For this purpose, each part of the 
membrane circuit was characterized in transistor-level sim- 
ulations, which were used to establish the translation func- 
tions between the scaled AdEx parameters and their hard- 
ware counterparts. 

However, due to transistor size mismatch in the hard- 
ware, these translation functions are expected to differ from 
neuron to neuron. A calibration software has been developed 
to automatically parameterize these translation functions for 
each neuron. For each neuron model parameter, the soft- 
ware will send a spectrum of values to the HICANN build- 
ing block, and measure the resulting membrane potentials of 
the current cell. It will then deduce the corresponding AdEx 
parameters from these measurements, and store the value 
pairs into a database. After a given number of measurement 
points, the algorithm will compute the relation between the 
hardware parameters sent to the floating gates and the AdEx 
parameters, and store this function into the database. Figure 
13 illustrates the calibration software architecture. 

Once the calibration step is done, the database can be 
used to automatically convert a set of biological neuron pa- 
rameters to their hardware counterparts, allowing on-the-fly 
conversion of neuron parameters for the wafer- scale hard- 
ware system. 

Concerning the synapses, there are mainly two restric- 
tions ensuing from the chip design: 256 synapses of the 
same row share the maximal conductance gmax and the short 
term plasticity mechanism, and weights are restricted to a 
4-bit resolution. By averaging over all active synapses, the 
transformation algorithm determines gmax and sets the digi- 
tal weights accordingly, using stochastic rounding to avoid 
systematic errors. 



The parameter transformation maps parameters of given 
neuron and synapse models into the hardware parameter 
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Fig. 13: Architecture of the caHbration software. The main 
component, the caHbration controller, executes the calibra- 
tion algorithms and communicates with the hardware, the 
oscilloscope and the database via dedicated interfaces. The 
calibration software also incorporates an AdEx model sim- 
ulator to compare software and hardware results. 

2.2.8 Application of the Mapping Flow onto the FACETS 
Chip-Based System 

In order to further demonstrate the versatility of the 
GraphModel-based mapping flow introduced in Sec- 
tion 2.2.4, we briefly outline the adoption of this procedure 
to the operation of the FACETS chip-based systems (see 
Section 2.1.5). This integration avoids code redundancy 
by unifying the previously independent PyNN back-ends 
and allows to map neural architectures onto inter-connected 
chips beyond single-chip boundaries (Jeltsch, 2010). Due 
to the flexible design of the mapping framework, the trans- 
lation of the PyNN description into the biological graph 
representation (see Section 2.2.2) and the placing of biolog- 
ical neurons onto their hardware counterparts (see Section 
2.2.5) could be kept completely unchanged. Necessary ex- 
tensions were limited to the development of a new internal 
hardware model that captures all features of the chip-based 
system as well as adapted versions of the routing and the 
parameter translation (described in Section 2.2.6 and 2.2.7, 
respectively) to match the different network topology and 
electrical parameters. Together with the low-level event dis- 
tribution network established by Friedmann (2009), neural 
network models can now be scaled to multiple chips. 

2.2.9 Mapping Analysis and Visualization 

The application Graph Visualization Tool - GraViTo as de- 
scribed by Ehrlich et al (2010) aids in analyzing the mapping 
results. GraViTo, as shown in Figure 14, integrates several 
modules that display graph models in textual and graphical 



form and gathers statistical data. One can selectively access 
single nodes inside the data structure and visualize their con- 
text, dependencies and relations with other nodes in the sys- 
tem. 




Fig. 14: Screenshot of the GraViTo application. 



The example of the GraViTo views shows a tree view on 
the left which is utilized to browse the hierarchical structure 
of the graph model and examine contents and connections 
of individual nodes. The 3-D view on the right provides a 
virtual representation of the FACETS wafer- scale hardware 
system for interactively browsing its architecture and con- 
figuration. It also provides a global overview over the single 
hardware components and the networks they form. Various 
statistics such as histograms for utilization of the crossbars 
or the synaptic connection lengths are gathered and can be 
displayed. 

Another option for a systematic mapping analysis arises 
from the previously mentioned possibility to re-translate the 
configured HardwareGraph contents via the mapping edges 
through the BioGraph into a PostMapping PyNN script. This 
script intrinsically contains all model distortions caused by 
the mapping process, e.g. lost synapses and discretized or 
clipped parameter values. Exploiting the PyNN concept, it 
can then be directly evaluated with a software simulator to 
extract possible functional consequences of the structural 
distortions, avoiding interferences with other effects like on- 
wafer communication bandwidth limitations. 

2.2.10 Hardware Low-Level Interfacing 

A specialized protocol of the class of selective ARQ^ pro- 
tocols is used to provide a fast and reliable communication 
channel with the neuromorphic hardware device. In the OSI 
model this corresponds to the transport layer. 

Configuration and experimental data is bidirectionally 
transmitted via two lOGBii/s Ethernet links per FPGA. In 

^ Automatic Repeat reQuest 
Open Systems Interconnection model 
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order to handle up to 2 GByt^s of traffic while keeping the 
load of the host computer as low as possible, several soft- 
ware techniques have been applied to the protocol imple- 
mentation. Various features of existing transport protocols, 
notably TCP, have been implemented, including congestion 
avoidance, RTT^^ estimation and packet routing to keep the 
connection in a stable and bandwidth maximizing regime. 

In matters of performance the framework is divided into 
three mainly independent processing threads (see Figure 15, 
receiver thread RX, sender thread TX and resend thread) to 
exploit the speed-up in execution of modern multiproces- 
sor systems. Performance critical data, e.g. spike data can 
be placed in shared memory and passed to and from the 
hardware abstraction layers such as to avoid unnecessary 
copying. These shared data structures have to be protected 
against concurrent accesses which imposes an additional 
overhead in processing time. Thus, to keep the number of 
system calls and context switches small, access to data lo- 
cated in shared memory is protected by means of custom 
built user-space fencing and locking methods. 
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Fig. 15: Configuration and runtime control steps in the hard- 
ware abstraction layer: The Experiment module acquires the 
configuration data from the Mapping Process (see Section 
2.2.2 and Figure 8), generates a hardware- specific represen- 
tation of this data and triggers the transfer to the hardware 
system. 

11 Round Trip Time 



A purely software- stack-based test has been developed 
that establishes a reliable ARQ connection between two host 
computers via 10 Gigabit Ethernet. With a hardware- specific 
version of this protocol, i.e. with frame sizing and protocol 
window size, it delivers IQGBi^s (Schilling, 2010). 

2.3 Virtual Hardware 

An executable specification of the FACETS wafer- scale 
hardware system serves as a versatile tool not only during 
device design and optimization, but also as a test bench for 
all involved software layers. It is a functional model that can 
be used to explore the behavior and characteristics of the 
real wafer-scale system in its final phase of development. 

2.3.1 Implementation 

The so-called virtual hardware is a detailed simulation 
of the final hardware platform and has been implemented 
in C++/SystemC (Vogginger, 2010). The virtual hardware 
replicates its physical counterpart in all aspects regarding 
functionality and configuration space. Every module of the 
real hardware has its functional counterpart in the virtual 
device, where especially the interface and communication 
structures accurately correspond to the physical system. It 
implements all analog and mixed- signal modules such as 
AdEx neurons and dynamic synapses (depressing and facili- 
tating), as well as all units responsible for LI and L2 routing. 
Compared to analog and RTL^^ hardware simulations, this 
model is tuned towards simulation speed using behavioral 
models of all relevant functional components. However, it 
is possible to replace individual modules by more sophisti- 
cated models, all the way down to simulating single wires 
on the chip. 

The current implementation of the virtual hardware dif- 
fers from the real hardware system in several aspects, most 
of them meeting efficiency considerations. The executable 
system specification is not operated from a host PC but di- 
rectly from higher software layers, such that the host-to- 
system communication is not simulated. Furthermore, the 
configuration of the HICANN building block and its com- 
ponents is not conducted via packets received from L2, as 
the software implementation of the used protocol is still un- 
der development. Instead, every HICANN obtains its con- 
figuration via direct access to the GraphModel (see Sec- 
tion 2.2.4). Despite these differences the virtual hardware 
remains a proper replica of the FACETS wafer-scale sys- 
tem providing equal functionality while not suffering from 
hardware- specific constraints like transistor-level imperfec- 
tions from the manufacturing process. 

1^ Register Transfer Level 
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2.3.2 Analysis And Verification Based On Virtual Hardware 

With its functionality and flexibility, the virtual hardware is 
an essential tool for the development of the software frame- 
work operating the FACETS wafer- scale hardware This in- 
cludes the PyNN interface and the placement, routing and 
parameter transformation algorithms (see Sections 2.2.1 and 
2.2.2), which can already be tested and verified despite the 
real hardware not yet being available. The development of 
a hardware system, which shall be useful in a neural mod- 
eling context, can be strongly supported already during its 
design phase by determining constraints inherent to the sys- 
tem architecture, such as communication bottlenecks or the 
effect of shared and digitized parameters. Their influence 
can be evaluated without the interference of hardware im- 
perfections or a missing calibration. Such studies build the 
basis for improvements in the hardware design or, if pos- 
sible, the development of software-based corrections. The 
virtual hardware can be used from PyNN-like any other sup- 
ported software simulator, thereby also offering an early 
modeler's perspective onto the capabilities of the future 
FACETS wafer-scale platform. Any PyNN-model, in par- 
ticular the benchmark models described in Section 2.4, can 
be applied to this setup. Their output can later be analyzed 
and compared to reference software simulations, revealing 
the impact of hardware constraints onto the model behav- 
ior, e.g. the loss of certain synaptic connections during the 
mapping process. 



2.4 Benchmark Model Library 

We will now present a set of experiments that serve as 
benchmarks for the previously described mapping process. 
The setups are implemented in PyNN and have been con- 
tributed by FACETS project partners. They not only cover 
various computational aspects like memory, pattern recogni- 
tion, robust information propagation in networks or dynamic 
switching between different functional modes, but also very 
different structural characteristics. 

2.4. 1 Layer 2/3 Attractor Memory Model 

The model used here remains faithful to the model of neo- 
cortical layers 2/3 in Lundqvist et al (2006), and in doing so 
retains the modularity that is the key aspect of this architec- 
ture (Lundqvist et al, 2010). It represents a patch of cortex 
arranged on a hexagonal topology of A^hc hypercolumns each 
separated by 500 jam, in agreement with data from cat cere- 
bral cortex recordings. Each hypercolumn is further subdi- 
vided into A/mc minicolumns, and various estimates suggest 
that there are about 100 minicolumns bundled into a hyper- 
column (Mountcastle, 1997; Buxhoeveden and Casanova, 
2002). For the default version of the Layer 2/3 Attractor 



Memory benchmark model, a total number of A^hc = 9 hy- 
percolumns and a sub- sampling of A^mc = 8 minicolumns per 
hypercolumn has been used. 

The arrangement of the cells in the local microcircuit to- 
gether with connection probabilities is shown in Figure 16a. 



HC 



\ '^BAS-PYR 



^Pyramidal ^Basket ^ RSNP 
(a) L2/3 network architecture 




(b) L2/3 model with 9 HC and 8 MC each 

Fig. 16: Schematic detailing the network arrangement and 
all the excitatory and inhibitory pathways between different 
cell groups and their connection densities in the L2/3 At- 
tractor Memory network model, (a) Connectivity densities 
of the sub-sampled network model. See the text for further 
description, (b) Cartoon of a network with 9 hypercolumns 
(HC). Each hypercolumn has 8 circularly arranged minicol- 
umn (MC). The large disc at the center of each hypercolumn 
represents a population of basket cells. Dashed lines show 
mutually exciting minicolumns that are distributed over dif- 
ferent hypercolumns, forming a pattern. 



In the default variant of the model, each minicolumn 
consists of 30 pyramidal cells densely connected to other 
pyramidal cells in the same minicolumn (Ppyr-pyr = 25%) 
and two regular spiking non pyramidal (RSNP) cells that 
project to Prsnp-pyr = 70% of the pyramidal cells. Each hy- 
percolumn has 8 basket cells, with each pyramidal cell in 
a minicolumn targeting Ppyr-bas = 70% of neighboring bas- 
ket cells, and each basket cell targeting Pbas pyr = 70% of 
neighboring pyramidal cells. The extent of basket cell in- 
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hibition is limited to its home-hypercolumn (Douglas and 
Martin, 2004). Apart from these local connections, pyra- 
midal cells located in different hypercolumns are also con- 
nected globally (Pmc-mc = 17%). The cartoon in Figure 16b 
shows how the minicolumns in different hypercolumns, de- 
noted by dashed lines, are connected. We developed meth- 
ods to scale this architecture up or down by means of both 
A^Hc and Nmc without losing its functionality. They are de- 
scribed in Section 2.5.1 and experimentally applied in Sec- 
tion 3.2. 

Thus, a set of mutually exciting minicolumns distributed 
over different hypercolumns represents a stored pattern or an 
attractor of the network dynamics. RSNP cells in a minicol- 
umn also receive long-range excitation. They are excited by 
distant pyramidal cells, given their home minicolumn is not 
part of the active pattern, thus inhibiting the pyramidal cells 
in the minicolumn. In this network, we can store as many 
patterns as the number of minicolumns in a hypercolumn, 
but by allowing for overlapping memory patterns the num- 
ber of stored patterns can be increased significantly. 

Figure 17 shows a raster plot of the activity of the net- 
work, when all pyramidal cells are excited by individual 
Poisson inputs of the same rate. 
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Fig. 17: Raster plot of characteristic activity of an L2/3 At- 
tractor Memory network with 9 hypercolumns and 8 attrac- 
tor s. 



Whenever an attractor becomes stronger than the others 
(which happens randomly), it completely suppresses their 
activity for a short period of time. 

Pyramidal cells in an active attractor are in a so-called 
UP-state, where their average membrane potential is a few 
mV above its rest value. When plotting the trajectory of the 
system in potential space, with each axis representing the 
average membrane potential of all neurons inside an attrac- 
tor, a projection along the main diagonal (the line which is 
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(The projection plane is perpendicular to the main diagonal.) 
(b) Mean voltage trajectory 
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(The projection plane is perpendicular to the main diagonal.) 
(c) Mean rate trajectory 

Fig. 18: (a) Construction of phase space projection plots as 
shown e.g. in (b) and (c): The trajectory in an ^-dimensional 
phase space (here: ^ = 3) is projected to a hyper-plane per- 
pendicular to the main diagonal, (b) Trajectory projection of 
the attractor network state evolving in 8 -dimensional mean 
voltage and (c) mean rate phase space. Axis values represent 
the projected offset from a base value, which is the neuron 
resting potential (in mV) for the voltage traces and OHz for 
the rate traces. The curve becomes thicker and darker as the 
phase space velocity decreases. 
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equidistant to all axes) will yield a typical star-like pattern 
(see Figure 18). 

The synaptic plasticity mechanisms are chosen such as 
to prevent a single attractor from becoming persistently ac- 
tive. Excitatory-to-excitatory synapses are modeled as de- 
pressing, which weakens the mutual activation of active 
pyramidal cells in time. Additionally, the neurons feature an 
adaptation mechanism, which suppresses prolonged firing. 
Both mechanisms have the effect of weakening attractors 
over time, such that, in contrast to a classic WTA network, 
also weaker patterns may become active at times. 

2.4.2 Synfire Chains 

Similar to classical Synfire Chain models (Diesmann et al, 
1999; Aviel et al, 2003; Kumar et al, 2008, 2010), the ver- 
sion chosen as a mapping benchmark consists of a chain 
of neuron groups connected in a feedforward fashion, with 
a certain delay in between. This allows spiking activity to 
propagate along the chain in a given direction (see Fig- 
ure 19). In addition to this, the benchmark Synfire Chain 
model implements feedforward inhibition by subdividing 
each group into a regular spiking (RS), excitatory (80%) and 
difast spiking (FS), inhibitory (20%) population (Kremkow 
et al, 20 10b, a). Inhibitory cells are also activated by feedfor- 
ward projections of excitatory cells from the previous group, 
but project only locally onto the excitatory population of the 
same group with a small delay. This allows a fine control 
over the duration of spiking in a single group and prevents 
temporal broadening of the signal as it gets passed down 
along the chain. In the original model of Kremkow et al 
(2010b), a Synfire Chain group consists of 100 RS and 25 
FS cells. Every cell, RS or FS, receives a total of 60 excita- 
tory inputs from the previous RS population. Additionally, 
every RS cell receives input from all 25 inhibitory neurons 
of the FS population within its own group. The inhibition is 
tuned such that every excitatory neuron gets to spike exactly 
once upon activation (see Figure 19). 



Group 1 Group 2 Group n 




Fig. 19: Schematic of the Synfire Chain benchmark model. 

Methods to scale the size of this model up or down are 
available and described in Section 2.5.1. Different architec- 
ture sizes are used to benchmark the quality of the previ- 
ously described mapping process. See Section 3.3 for eval- 
uation data based on scaled benchmark models. 
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Fig. 20: Raster plot of characteristic RS activity of the Syn- 
fire Chain without (top) and with (bottom) feedforward in- 
hibition. Note the constant spike packet width in case of the 
active feedforward inhibition mechanism. 

2.4.3 Self-Sustained Al States 

Randomly connected networks of integrate-and-fire neurons 
are known to display asynchronous irregular (Al) activity 
states, where neurons discharge with a high level of ir- 
regularity, similar to stochastic processes, and with a low 
level of synchrony (Brunei, 2000). These states were also 
found in various other network models, including those us- 
ing conductance-based (Vogels and Abbott, 2005) and non- 
linear integrate-and-fire neuron models (Destexhe, 2009). 
They were shown to have properties very similar to the 
discharge patterns observed in awake animals (El Boustani 
et al, 2007). Because cortical neurons are characterized by 
nonlinear intrinsic properties (Connors and Gutnick, 1990), 
our choice of an Al state benchmark is based on the AdEx 
neuron model. These nonlinear IF cells are implemented in 
the FACETS wafer-scale hardware (see Section 2.1.2) and 
reproduce several cell classes observed experimentally in 
cortex and thalamus (see Destexhe, 2009). 

The particularity of the Al benchmark model is that it 
allows testing the influence of the various cell classes on the 
genesis of Al states by varying the different cellular proper- 
ties. The model considers the most prominent cell classes in 
cerebral cortex, such as the regular spiking (RS) cell, i\\tfast 
spiking (FS) cell, the low-threshold spike (LTS) cell and the 
bursting cells of the thalamus. It was found that randomly 
connected networks of RS and FS cells with conductance- 
based synaptic interactions can sustain Al states, but only 
if the adaptation currents (typical of RS cells) are not too 
strong. With strong adaptation, the network cannot sustain 
Al states. 

To the contrary, adding another cell class characterized 
by rebound responses (the LTS cell) greatly enhanced the 
robustness of Al states, and networks as small as about 100 
neurons can self-sustain Al states with a proportion of 5% 
of LTS cells. Interestingly, if two of such networks (one with 
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strong adaptation, another one with LTS cells) are recipro- 
cally connected, the resulting 2-layer network can generate 
alternating periods of activity and silences. This pattern is 
very similar to the Up- and Down-states observed in cortical 
activity during slow- wave sleep (Steriade, 2003). Reducing 
the adaptation leads to sustained AI states, and is reminis- 
cent of the transition from sleep to wakefulness, a sort of 
"wakening" of the network. In the context of this paper, we 
use two variants of such networks as benchmarks: First, a 
network of RS, FS and LTS cells as a minimal model of AI 
states. Second, a 2-layer cortical network displaying Up and 
Down states. The latter is depicted in Figure 21. 



Cortex Layer A 




Cortex Layer B 



Fig. 21: Schematic of the Self-Sustained AI States bench- 
mark model. It consists of two cortical layers A and B. Every 
layer has an excitatory and an inhibitory population, each of 
which contains certain sets of cell types (RS, FS, LTS) that 
determine the network dynamics (see text for details). The 
excitatory populations project onto every other population, 
while the inhibitory populations only act within their layer. 

Also this model can be scaled up and down in its size 
in order to benchmark the PyNN-to-hardware mapping pro- 
cess. In its default version, layer A consists of 1600 excita- 
tory RS and 400 inhibitory FS cells. Layer B contains 400 
excitatory neurons, 90% of which are RS and 10% of which 
are LTS type, as well as 100 inhibitory FS cells. Within a 
single layer the connection probability is 2% for a network 
size of 2000 cells. For smaller networks as for layer B the 
connection probability is rescaled inversely to the network 
size. The inter-layer connectivity is excitatory only and has 
a connection probability of 1%. 

2.5 Analysis Based on Software Simulations 

Compared to pure software simulators, dedicated neuro- 
morphic hardware suffers more from limitations and im- 
perfections, which may either directly distort the morphol- 
ogy of the emulated network or influence its dynamics in 
more subtle ways. On one hand, physical limitations such as 
size and number of implemented circuits or communication 



bandwidth impose rather inflexible constraints on parame- 
ters such as number of neurons and synapses or the amount 
of accessible data. On the other hand, as VLSI hardware 
is, inevitably, subject to manufacturing process variations, 
individual circuits have varying characteristics, which can 
only be compensated by calibration to a certain degree. As 
all these effects influence the dynamics of an emulated net- 
work simultaneously, it is usually very difficult to identify 
the connection between an individual cause and its effect. 
The most straightforward solution is to artificially impose 
individual hardware- specific distortions on software simula- 
tions, identify their impact on the network's dynamics and 
find, if possible, suitable compensation mechanisms. 

2.5.7 Network Scaling 

It is very often the case that the robustness of a network 
scales together with its size, or, in specific cases, with the 
size or number of individual components. However, before 
analyzing the effects of distortions, it is indispensable to de- 
vise a way of scaling the (undistorted) network without in- 
fluencing its dynamics. We have developed specific rules for 
scaling two of our three benchmark models, in order to both 
explore and learn how to circumvent the limitations of the 
hardware. 

Layer 2/3 Attractor Memory The most obvious and natu- 
ral scaling of an attractor memory network lies in changing 
the number of attractors, i.e. in this particular case the num- 
ber of minicolumns per hypercolumn. Also, the size of the 
attractors can be evenly scaled by changing the number of 
units per attractor, i.e. the number of hypercolumns. Finally, 
the size of the minicolumns itself can be scaled, by varying 
the number of neurons per column (excitatory and inhibitory 
populations can be individually tuned by changing the num- 
ber of pyramidal and RSNP/basket cells, respectively). 

These changes would heavily affect the network dynam- 
ics, were they not accompanied by corresponding modifica- 
tions in the network connectivity. The behavior of the net- 
work is likely to remain unchanged if the excitation/inhi- 
bition patterns of each neuron are kept intact. This is most 
easily accomplished by keeping both the excitatory and the 
inhibitory fan-in of each neuron constant without modifying 
synaptic weights. To this end, simple scaling formulae (lin- 
ear with respect to size and number of the afferent popula- 
tions) for the connection probabilities between populations 
have been derived. 

Figure 22 shows a scaling example where the number 
of attractors is varied. At first glance, it may seem that the 
characteristic attractor dynamics are affected, as the average 
attractor dwell times decrease from about 300 ms to under 
200 ms. However, this is only an apparent effect, as the tem- 
porary dominance of individual attractors is a result of local 
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(b) 8 HC with 20 attractors 

Fig. 22: L2/3 network, scaled down to 9 hypercolumns (HC) 
with 3 attractors and scaled up to 8 hypercolumns with 20 
attractors. Note the relatively long dwell times in (a) com- 
pared to short dwell times in (b). 



fluctuations in the input. An increasing number of attrac- 
tors means there is more competition among them, which 
in mathematical terms translates to shorter, smaller fluctua- 
tions in the input rate, therefore leading to decreasing dwell 
times. When only two attractors are stimulated, the dynam- 
ics are not influenced by the total number of attractors in the 
network, which supports our scaling rules. 



Therefore, modifying a population size by a factor a sim- 
ply requires that all connection probabilities are modified 
by a factor 1/a. Some difficulties may arise when popula- 
tions become too small, as the binomial connection distribu- 
tion diverges away from a symmetric Gaussian, favoring a 
smaller number of afferent connections and leading to activ- 
ity attenuation and eventually to a break in the pulse trans- 
mission (Kumar et al, 2010). The straightforward remedy 
is offered by the PyNN class FixedNumberPreConnector 
which guarantees a constant but randomly distributed num- 
ber of inputs. If populations become too small to accommo- 
date the required number of connections, synaptic weights 
can be increased to compensate for the synaptic loss. The 
same can be done to cope with synapse loss resulting from 
the mapping process, as described in Section 3.1.1. Figure 
23 shows a scaling example where both size and number of 
populations are varied. 
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(a) 5 populations with 64 excitatory neurons each 
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(b) 32 populations with 200 excitatory neurons each 
Fig. 23: Synfire Chain scaling examples. 



Synfire Chain with Feedforward Inhibition Scaling the Syn- 
fire Chain is a comparatively simple task, as there are no 
feedback or recurrent connections. Scaling the number of 
units does not require any changes in connectivity. When 
the number of neurons per unit is changed, the dynamics 
can be kept unmodified (synchronous firing within a popu- 
lation) if the number of inputs per neuron remains the same. 



2.5.2 Simulating Hardware Imperfections 

For this study, we have investigated several distortion mech- 
anisms which can be replicated in software simulations and 
do not require the presence of the actual hardware. 
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A first limitation of the hardware Hes in the fact that ax- 
onal and dendritic delays can not be programmed and the in- 
trinsic delays caused by the hardware communication infra- 
structure are very small when translated to biological real- 
time. This means that, effectively, the hardware can not ac- 
curately reproduce the dynamics of networks which rely on 
delayed transmission of action potentials between two com- 
municating neurons. 

Two further distortion sources lie within the synaptic 
circuits of the HICANN building blocks. Since the synap- 
tic weight resolution in a neuromorphic hardware system is 
limited (see Section 2. 1 .2), large differences between synap- 
tic weights which are mapped to the same synapse driver 
may cause significant changes to the synapses at the lower 
end of the weight spectrum. Also, from our experience with 
the FACETS chip-based systems (Briiderle and Muller et al, 
2009), we know that variations in the manufacturing process 
can lead to a spatial synaptic weight jitter of up to 40% of 
the intended value (<7 = 0.4jU, assuming a Gaussian distribu- 
tion), even after calibration. This might be fatal for networks 
which rely on precise tuning of synaptic weights. 

Because of the limited bandwidth of the communication 
layers (see Section 2.1.3), synapses may be lost during the 
biology-to-hardware mapping process. Ensuing distortions 
depend strongly on the network topology and can become 
arbitrarily high for sufficiently large and densely connected 
networks. 

Additionally, neuron loss can also become a key factor, 
not necessarily due to hardware limitations (usually, synap- 
tic loss becomes significant long before the number of net- 
work neurons exceeds the number of neurons available on 
the hardware), but as an option to counteract synaptic loss 
by controlled reduction in the number of neurons. 

Although it does not apply to the three benchmark net- 
works we have studied, the hardware neuron model itself 
may become a limiting factor, when trying to map models 
which rely on more complex neurons. However, we con- 
sider this to be an unlikely case, as the AdEx model has been 
shown to successfully reproduce a wide variety of spike pat- 
terns (Brette and Gerstner, 2005; Destexhe, 2009; Millner 
et al, 2010) and has also proven very successful in emulating 
biological spiking activity (Naud et al, 2008). This is not the 
case for the FACETS chip-based neuromorphic system (see 
Section 2. 1 .5), which only offers simple leaky integrate- and- 
fire neurons. Section 3.2 describes a setup where the missing 
adaptation mechanism was compensated by tuning several 
other parameters. 

2.5.3 Analysis and Development ofSTDP in Hardware 

Synaptic plasticity on the highly accelerated FACETS 
wafer-scale hardware (for a detailed description see Section 
2.1.2) provides a promising architecture for doing research 



on learning. But so far there are no studies about the effect 
of low weight resolutions and limited update frequencies 
on the functionality of single synapses and consequently 
neuronal networks. In the following, two directions of study 
will be outlined and one detailed example will be given. 

First, the question of a required minimal resolution for 
synaptic weights and their optimal update frequency is in- 
vestigated. However, those two restrictions may be dom- 
inated by production process variations that set the upper 
limit for the functionality of the synapses. Production pro- 
cess variations cause the supposedly identical circuits for 
causal and acausal correlation measurement to differ due 
to variations in their transistors. This asymmetry limits the 
accuracy of detecting correlations or in other words causes 
a correlation offset. With respect to learning neuronal net- 
works (e.g. Davison and Fregnac, 2006), we are especially 
interested in the effects of hardware synapses on their ability 
to detect synchronous input when embedded into an appro- 
priate architecture. 

Secondly, the dynamics of discretized STDP are ana- 
lyzed based on the assumption that the weight discretiza- 
tion is the most crucial restricting component influencing 
the dynamics of single synapses and whole networks. This 
analysis is carried out with respect to the equilibrium weight 
distribution that is obtained by evaluating an initial synaptic 
weight value in sequence. Within this sequence of weight 
evolution, the probability for causal evaluation is equal to 
the one for acausal evaluation. Analytical equilibrium dis- 
tributions (van Rossum et al, 2000) as well as numerical 
equilibrium distributions of continuous weights are used as 
a reference. 

Here, we shall discuss in detail one analysis on the ef- 
fect of low resolution weights within a neuronal network. In 
order to isolate the functionality of a single synapse from 
network effects a simple network is defined (Figure 40 A). A 
population of pre- synaptic neurons is connected to a single 
post-synaptic neuron. The Intermediate Giitig STDP model 
(Gutig et al, 2003; Morrison et al, 2008) is used for the con- 
struction of the look-up table (see Section 2.1.2). Develop- 
ing synaptic weights are compared for either correlated or 
uncorrelated pre- synaptic input. Correlation within the pre- 
synaptic population is generated by a multiple interaction 
process (Kuhn et al, 2003), whereas in the uncorrelated case 
the firing pattern of the pre-synaptic neurons are those of 
Poisson processes. Results for the effect of discrete weights 
on this network are presented in Section 3.2.4. 

To avoid expensive changes of the chip layouts, the 
hardware restrictions are analyzed with preparative software 
simulations. Therefore the standard STDP synapse model 
of the software simulator NEST (Gewaltig and Diesmann, 
2007) was modified by introducing a causal and acausal ac- 
cumulation circuit, a digital weight value and global weight 
update controllers. In the following we will call this the 
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hardware inspired model. As a reference model another 
software synapse model with continuous weight values and 
continuous weight updates, but with a symmetric nearest- 
neighbor spike pairing scheme was implemented. 

Further analysis with focus on the weight update fre- 
quency is in progress. In the current prototype of the 
HICANN building block the causal and acausal accumu- 
lation circuits will be reset commonly, if a weight update 
is performed. Such a common reset distorts the counterbal- 
ancing effect of the accumulation circuit receiving less cor- 
relations, because the common reset suppresses the circuit 
to ever elicit a weight update. Consequently the dominating 
accumulation circuit, in terms of eliciting weight updates, 
drives all synaptic weights to its corresponding boundary 
value. For future improvements, the performance gained by 
adding a second reset line to reset both accumulation circuits 
independently will be compared to the performance gain due 
to a more detailed readout of these circuits. Details about 
these additional studies will be presented in a publication 
that is in preparation. 

3 Results 

In the following, a summary of results is provided, all 
of which have been acquired by means of the workflow 
described in Section 2. The presented data demonstrate 
the functionality of both the PyNN-to-hardware mapping 
framework and the virtual wafer-scale hardware system, as 
the applied benchmark models are shown to exhibit the ex- 
pected functionality when processed by this environment. 
Examples of mapping- specific distortion analyses based on 
reference software simulations are provided and discussed. 
The effect of discretized synaptic weights, as implemented 
in the FACETS wafer-scale hardware, is analyzed in the con- 
text of an STDP benchmark. Scalability questions regard- 
ing the graph-based data structure for the mapping process 
are considered on the basis of experimental data. Further- 
more, we present first results of a successful application of 
the presented AdEx neuron circuit calibration scheme ac- 
quired with a HICANN prototype. 

3.1 Benchmark Results 

The benchmark models and their target output descriptions 
described in Section 2.4 represent an essential tool to test 
and verify the workflow presented in this article on a high, 
functional level. This is important especially in the context 
of studies on neural network dynamics, where the identifi- 
cation of erroneous components from the analysis of spa- 
tiotemporal spike patterns can be very difficult due to a lack 
of insight and intuition in the field of neural information pro- 
cessing principles. 



3.1.1 Distortion Mechanisms and Compensation Strategies 
Based on Software Simulations 

Even with a virtual version of the hardware system being 
available, software simulations remain a powerful analysis 
tool, especially since they offer access to the full range of 
dynamic variables of the modeled neural network, some of 
which may be inaccessible on the virtual hardware. In the 
following, we will demonstrate the effects of different dis- 
tortion mechanisms via software simulations and propose 
several methods to either circumvent or counteract them. 
These methods are chosen such as to lie within the possi- 
bilities of the hardware system. 

Layer 2/3 Attractor Memory The functionality of the L2/3 
network is best determined by a combined analysis of spike 
and voltage data. While a visual inspection of a raster plot 
of all neurons usually provides a good basis for evaluation, 
a more thorough investigation of UP/DOWN- state statistics 
requires the analysis of voltages from a relatively large num- 
ber of individual cells. Both the extraction of the full spike 
data and of multiple voltage traces are not possible on the 
hardware, making the use of software simulations indispens- 
able for a proper evaluation of the effects of mapping distor- 
tions. 

In order to replicate a biologically plausible pattern of 
axonal and dendritic delays, we have implemented a net- 
work geometry as exemplified in Figure 24, with distance- 
dependent delays. When setting all delays to zero, we have 
observed no significant changes in the network dynamics. 
This is not unexpected, as this model relies more on firing 
rates rather than precise spike timing in order for particular 
attractors to become activated. 




Fig. 24: Geometry of the L2/3 Attractor Memory model, the 
unit on the axes is |Ltm. 
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Depending on the amount of spatial synaptic weight jit- 
ter, the network shows varying levels of tolerance. For val- 
ues up to 25%, the dynamics only suffer minor changes. At 
50% jitter, spontaneous activation is completely suppressed, 
but activation via input from L4 remains functional, exhibit- 
ing the same phenomena of pattern completion and rivalry 
as seen in the undistorted case (see Figure 25). 



work becomes remarkably resilient, tolerating values as high 
as 40% (see Figure 26). 
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(a) Spontaneous activity at 25% spatial jitter 
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(b) L4 activation at 50% spatial jitter 



Fig. 25: Effects of spatial weight jitter on a L2/3 Attractor 
Memory network with 9 hypercolumns and 3 attractors. 



Because of its intricate connectivity, which spans both 
local and global scales, the Layer 2/3 Attractor Memory net- 
work was expected to be quite sensitive to synaptic loss. In- 
deed, if the synapse loss is localized to certain attractors, 
they become either inactive (for excitatory synapse loss) or 
dominant (for inhibitory synapse loss). However, if synapse 
loss is spread homogeneously over all populations, the net- 
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(a) Raster plot, 40% synaptic loss 




(The projection plane is perpendicular to the main diagonal.) 
(b) Voltage star plot, 40% synaptic loss 

Fig. 26: Synaptic loss tolerance of an L2/3 Attractor Mem- 
ory network. Synaptic loss was assumed homogeneous over 
all populations. 



In contrast to synaptic loss, the loss of pyramidal neu- 
rons (which make up about 87% of the network) has only 
little effect on the network dynamics, even up to values as 
high as 50%, regardless of the number of minicolumns or 
hypercolumns present (see Figure 27). It is, for example, 
possible to have a functioning network with as little as 12 
pyramidal cells per minicolumn. This circumstance has ma- 
jor consequences for synapse loss compensation. 

When synapse loss increases beyond a certain limit, 
intra-attractor excitation and inter- attractor inhibition be- 
come too weak for attractor dynamics to emerge. The to- 
tal number of synapses scales linearly with the total num- 
ber of neurons (when network scaling conserves the affer- 
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Fig. 27: Attractor dwell times versus neuron loss in the L2/3 
Attractor Memory network. 
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(a) 60% synapse loss, uncompensated 



ent fan-in, as described in Section 2.5.1), so reducing the 
neuron count represents a straightforward way of circum- 
venting synapse loss. This can be achieved by reducing the 
number of attractors (which may, however, not always be 
desirable) or by reducing the number of neurons per attrac- 
tor by decreasing either the total number of hypercolumns 
or the neuron count per minicolumn. 

Elimination of pyramidal neurons (without re- scaling 
the fan-in) is a much more efficient method in terms of 
synapse number reduction, as the total synapse count has 
an approximately quadratic dependence on the number of 
neurons per minicolumn. Since, for this particular network 
model, attractor dynamics are largely insensitive to pyra- 
midal cell elimination, as described above, this becomes a 
method of choice when dealing with harsh bandwidth limi- 
tations. 

Especially in cases where synapse loss is relatively small 
and inhomogeneous, afferent synaptic input can be restored 
by increasing the corresponding synaptic weights (see Fig- 
ure 28). While it is always possible to hereby establish the 
required average firing rates of individual populations, this 
compensation mechanism needs to be used cautiously, as it 
can influence spike train statistics and neuron input-output 
curves. 

Synfire Chain with Feedforward Inhibition The Synfire 
Chain model presented in Section 2.4.2 relies heavily on de- 
layed transmission of action potentials between inhibitory 
and excitatory populations. Eliminating these delays causes 
afferent EPSPs and IPSPs to overlap, possibly leading to 
suppression of efferent spikes (Kremkow et al, 2010a). This 
makes a direct mapping of the model to the FACETS wafer- 
scale hardware impossible. However, as the hardware offers 
the possibility of tuning synaptic time constants of individ- 
ual neurons, it is possible to compensate for missing delays 
by adjusting the rising flank of EPSPs for the inhibitory neu- 
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(b) 60% synapse loss, compensation by modified weights 



Fig. 28: (a) High synaptic loss destroys attractor dynamics. 
Several methods for compensating or counteracting this ef- 
fect are presented in the text above, (b) shows the results of 
compensation by modified synaptic weights. 

rons. This can be achieved by increasing the corresponding 
synaptic time constants and decreasing the corresponding 
synaptic weights simultaneously (see Figure 29). 

Spatial synaptic weight jitter may effectively cancel it- 
self out for large numbers of synapses, but can lead to 
skewed afferent input, especially in smaller networks. De- 
pending on the amount of spatial jitter (variance of the un- 
derlying Gaussian, see Section 2.5.2), this might lead to in- 
dividual excitatory neurons not firing, as a consequence of 
a too low average afferent weight. This causes a chain reac- 
tion which leads to an increasing number of silent neurons 
for every subsequent population in the Synfire Chain, ulti- 
mately causing the activity to die out (see Figure 30a). 

Synapse loss has qualitatively the same effect, only man- 
ifests itself much stronger, as it is not symmetrically dis- 
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(a) Synfire Chain with delayed spike propagation 
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(b) Synfire Chain with modified synapses, no delays 

Fig. 29: (a) Delayed spike propagation is essential in the 
original Synfire Chain model. (b) By modifying synaptic pa- 
rameters (see text for details), effective afferent spike times 
can be reproduced without propagation delays. 

tributed around zero. Even relatively low values of around 
2% completely suppress the propagation of the signal after 
only few iterations (see Figure 30b). Both distortion mecha- 
nisms can be effectively compensated by increasing exci- 
tatory synaptic weights (see Figure 31). Since all excita- 
tory neurons within a population spike only once, simultane- 
ously, modification of synaptic weights does not affect spike 
train statistics. 

The other obvious way of compensating synapse loss is 
by decreasing the overall network size, as described for the 
L2/3 Attractor Memory network. This can be achieved by 
decreasing either the number of populations or their size. 
Since the mapping algorithm tends to cluster neurons from 
the same population together on neighboring HICANNs, re- 
ducing population sizes is more efficient for reducing the 
required communication bandwidth. 

3.1.2 Analysis Based on Virtual Hardware 

The benchmark experiments were simulated with the virtual 
hardware, thereby verifying the functionality of the whole 
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(a) Impact of spatial weight jitter on signal propagation 
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(b) Impact of synapse loss on signal propagation 

Fig. 30: (a) Sufficiently high spatial weight jitter causes a 
breakdown of signal propagation along the Synfire Chain, 
(b) Synapse loss is even more critical, completely attenuat- 
ing the signal after only few iterations. 

software workflow and the general capability of the system 
to serve as a neural modeling platform. 

Before applying the benchmark experiments (see Sec- 
tion 2.4), we determined the maximum reachable bandwidth 
of the L2 links (FPGA-to-DNC and DNC-to-FPGA) with 
the aid of the virtual hardware. We have found that - despite 
the Poisson distribution of spiking activity - the achieved 
bandwidth corresponds to the one theoretically expected 
from the data link speed and pulse packet sizes. 

The gathered results were used to enhance the routing 
of L2 pulse events to the wafer (see Section 2.2.6), which 
distributes external spike sources over all available L2 links, 
such that the bandwidth provided by a given hardware setup 
is fully exploited. The application of this is crucial, when 
it comes to the realization of network models with either 
high input rates as the Layer 2/3 attractor memory model, or 
highly correlated input as for the Synfire Chain, where hun- 
dreds of spikes need to be delivered within a very small time 
window. Having these limitations in mind, one can choose 
the size of the hardware setup properly before actually map- 
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Fig. 31: Synapse loss blocks the propagation of the signal 
along the synfire chain (white zone). A synapse loss prob- 
ability of p can be compensated very efficiently by scaling 
the weights by a factor of a / ( 1 — /?) (thick black line: a = 1 , 
thick black line: a = 2.5). Note that the thin black line stays 
within the light gray area that denotes a stable propagation 
with one spike per neuron. Increasing the synaptic weights 
even more, effectively overcompensating the synapse loss, 
results in multiple spikes being fired by the excitatory neu- 
rons (dark gray and black zone). The total number of spikes 
per burst is limited by the refractory period and the time un- 
til the first inhibitory spike, meaning that the signal does not 
broaden indefinitely. 

ping neural experiments onto the FACETS wafer- scale hard- 
ware, in a way that all requirements are considered in terms 
of spatial and temporal issues (i.e. neuron / synapse and 
bandwidth resources). 

Synfire Chain The Synfire Chain model with feedforward 
inhibition was successfully run on the virtual FACETS 
wafer- scale hardware. The stable propagation of pulse vol- 
leys from one group to the next is plotted in Figure 32. In 
this case the network consisted of 16 groups with 100 ex- 
citatory and 25 inhibitory each, the groups were connected 
to form a loop such that the activity would be sustained in- 
definitely. However, this model proved to be very sensitive 
to distortions: If more than 2 neurons of a group do not fire, 
the Synfire Chain will die out immediately, because the lo- 
cal inhibition comes up too early and prevents the excita- 
tory cells from spiking. This happens also due to the lack of 
synaptic delays in the current implementation of the hard- 
ware, as only LI connections are used for the routing of neu- 
ral events within the network. For upscaled versions of this 
model and a restricted hardware size, where the mapping 
process yields a synapse loss larger than 5%, the function- 
ality could not be sustained according to the high sensitivity 



of the used parameters, i.e. the pulse volley only reached 
the second group. Nevertheless, we were able to regain the 
benchmark functionality by compensating the synapse loss 
through either strengthened weights or downscaled neuron 
populations (see Section 3.1.1). 
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Fig. 32: Synfire Chain with 16 groups connected to a loop, 
simulated on the virtual hardware. 



Layer 2/3 Attractor Memory Model The presented software 
framework also performed very well when mapping such 
an elaborated neural architecture like the Layer 2/3 attrac- 
tor memory model onto the (virtual) FACETS wafer- scale 
hardware: Figure 33 shows the spike output of the default 
model with 2376 neurons simulated on a virtual wafer snip- 
pet containing 8x2 reticles. This successful replication of 
the benchmark's dynamics not only underscores the correct 
operation of the placing and routing algorithms (see Sec- 
tions 2.2.5 and 2.2.6), but also indicates that the transforma- 
tion from biological to hardware models (see Section 2.2.7) 
works properly and does not distort the model's behavior, 
concretely in this example a variety of different short-term 
plasticity settings could be transferred to shared hardware 
configurations. 

Self-Sustained AI States The cortical network with self- 
sustaining AI states was also successfully realized on the 
virtual hardware. The single-layer cortical model was im- 
plemented for different sizes and parameter sets, where the 
model functionality was preserved without distortions. The 
two-layer cortical network exhibiting Up and Down states 
was also realized at the default size with 2500 cells and 
varying adaptation parameters, see Figure 34 for an exem- 
plary raster plot together with a reference software simula- 
tion with NEURON. 
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Fig. 33: Raster plot of the Layer 2/3 Attractor Memory 
model simulated on the virtual hardware: firing activity is 
shown only for pyramidal cells, default size with 9 hyper- 
columns and 8 attractors. 



3.2 Cross-Platform Implementation of a Benchmark Model 

As a demonstration of the versatility of the methodological 
framework discussed in the previous sections, this section 
will present the implementation of one of our benchmark 
models on three different back-ends: the software simula- 
tor NEST, the FACETS chip-based system and the virtual 
wafer-scale hardware. For this purpose, we have chosen the 
L2/3 Attractor Memory model, due to its challenging con- 
nectivity patterns and the interesting high-level functional- 
ity it displays. Because of the limited size of the chip back- 
end, the original model needed to undergo some profound 
changes, which will be detailed in the following sections. 

3.2.1 FACETS Chip-Based Neuromorphic System 

One ASIC in the current version of the FACETS chip-based 
system as described in Section 2.1.5 offers 192 fully in- 
terconnectable leaky integrate-and-fire neurons with static 
synapses. Since the original model requires 2376 adapting 
neurons interconnected through plastic synapses, we had to 
heavily modify the network configuration in order to keep 
its functionality. Reducing the total number of neurons from 
2376 to 192 was done following the scaling rules described 
in 2.5.1. In this context, the observation that pyramidal cells 
can be lost without significantly affecting the dynamics of 
the network became extremely useful. In order to provide 
relatively long dwell times, we have chosen a setup with 
only three attractors and four hypercolumns (i.e. four mini- 
columns per attractor). The number of basket cells per hy- 
percolumn was reduced from the original 8 to 6, while the 
number of pyramidal cells per hypercolumn was reduced 
from the original 30 to 12. The number of RSNP cells per 
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Fig. 34: Raster plot of the two-layer cortical network ex- 
hibiting Up and Down states simulated with NEURON (a) 
and on the virtual hardware (b). Horizontal lines depict the 
limits between RS, FS and LTS neurons of layers A and B. 
The first cortical layer consists of 2000 cells, the second of 
500 cells. 10% of all cells are initially stimulated to induce 
asynchronous irregular firing in the whole network. The first 
layer is per se not self-sustaining, i.e. the activity dies out 
after a while, the second smaller layer is able to sustain its 
activity due to a large number of LTS cells. The sparse con- 
nectivity between the two layers assures that the activity in 
the first is reactivated by excitatory input from the second 
layer. 



minicolumn remained constant at 2. Thus, this setup imple- 
ments the original model architecture with exactly 192 neu- 
rons. See Figure 35 for a schematic of the resulting architec- 
ture. 

Due to a lack of neural adaptation and synaptic plastic- 
ity (which are both crucial in the original model, as they 
limit the pyramidal cell UP- state duration), we needed to 
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Fig. 35: Geometry of the scaled-down L2/3 Attractor Mem- 
ory network model. Note the greatly reduced number of hy- 
percolumns (HC) and of pyramidal cells per minicolumn 
(MC) as compared to Figure 24. 



adapt the neuron parameters (leak conductance, reset, rest, 
threshold and reversal potentials) and synapse characteris- 
tics (weight, decay time constant) in such a way as to retain 
as much as possible of the original dynamics, on average. 
One additional constraint which needed to be taken into ac- 
count was the limited range of synaptic weights available on 
the neuromorphic chip. We were able to compensate this, to 
some extent, by modifying the connection densities among 
the neuron populations. 

One important consequence is that because the network 
is unable to adapt, its dynamics change significantly. If one 
would only remove adaptation and plasticity, without chang- 
ing other parameters, the first attractor which becomes acti- 
vated would last indefinitely. Therefore, the removal of these 
two mechanisms needs to be accompanied by a reduction of 
intra- and inter-columnar excitation. This in turn causes the 
network to become much more input driven, which mani- 
fests itself in an extreme sensitivity of attractor dwell times 
towards the momentary input activity. Dwell times become 
more erratic and even small changes in the average input 
rate cause attractors to become either dominant or virtually 
inactive. 

Also, due to the limited input bandwidth of the ASIC 
(for the chosen architecture: 64 channels at about 80 Hz), 
some degree of input correlation was inevitable, as each of 
the 144 pyramidal cells requires a Poisson stimulation of 
300 Hz. In order to maintain attractor stability, we have cho- 
sen to have no overlapping inputs for different attractors 
(and thus zero correlation, for the Poisson input we have 
used). This, on the other hand, leads to an increased input 
correlation among pyramidal cells belonging to the same at- 
tractor, which, in absence of adaptation, tends to prolong 
attractor dwell times. 
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(a) Poisson input with an overall rate of 750, 1 and 550 Hz 
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(b) Poisson input with an overall rate of 750, 700 and 650 Hz 
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(c) Poisson input with an overall rate of 700, 800 and 600 Hz 

Fig. 36: Raster plots of scaled-down L2/3 Attractor Memory 
networks on the FACETS chip-based neuromorphic system. 
For explanation see text. 
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Figure 36 shows L2/3 Attractor Memory benchmark re- 
sults acquired with the FACETS chip-based system: (a) At- 
tractors have been excited by Poisson input with an overall 
rate of 750, 1 and 550 Hz, respectively. Note the relatively 
long dwell times, which are mostly due to high correlations 
among pyramidal inputs within an attractor. The discrepancy 
in the input activity needed to ensure a balanced activation 
of attractors 1 and 3 is due to hardware manufacturing fluc- 
tuations, which appear to be very complex in nature, often 
interacting with each other and being highly dependent on 
the ongoing activity on the chip, (b) Attractors have been 
excited by Poisson input with an overall rate of 750, 700 
and 650 Hz, respectively. Note the large fluctuations in at- 
tractor dwell time due to lack of adaptation which leads to 
strongly input-driven dynamics. Also note that, in contrast 
to Figure 36a, when attractor 2 becomes active, the input ac- 
tivity of attractor 3 required an increase of 100 Hz in order 
to achieve balanced activation. The most likely cause is ca- 
pacitive cross-talk between the analog circuits, which varies 
depending on the throughput rate, (c) Attractors have been 
excited by Poisson input with an overall rate of 700, 800 
and 600 Hz, respectively. Note that only a slight increase of 
the attractor 2 input rate, with respect to the other attractors, 
results in almost complete dominance of attractor 2. Again, 
this is due to the lack of adaptive mechanisms. 

3.2.2 Virtual Hardware 

The scaled-down version of the L2/3 Attractor Memory net- 
work model was successfully implemented on the virtual 
hardware (see Section 2.3). No changes had to be applied 
to the model in order to realize it on the FACETS wafer- 
scale virtual hardware, as the HICANN building block im- 
plements AdEx-type neurons, which include the dynamics 
of the leaky integrate-and-fire neurons from the chip-based 
system. The scaled-down model passed through the whole 
mapping process described in Sections 2.2.2, 2.2.5, 2.2.6 
and 2.2.7 and was finally mapped and simulated on a snippet 
of 2 X 2 reticles of a wafer. 

The results of the virtual hardware simulation of the 
scaled-down L2/3 Attractor Memory network can be seen 
in Figure 37, where the individual attractors were stimu- 
lated with different rates. Depending on the specific stim- 
ulation, the network reliably exhibits the same behavior as 
reference software simulations, which are described in the 
following section. Figure 38 shows the 3-D visualization of 
the network model and its mapping onto the wafer with the 
GraViTo software (see Section 2.2.9). 

3.2.3 NEST Simulator 

The same PyNN script as in the previous sections was even- 
tually used with the software simulator NEST, as a means of 
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(a) Poisson input with overall rate of 550, 1 and 550 Hz 
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(b) Poisson input with overall rate of 700 Hz 
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(c) Poisson input with overall rate of 700, 800 and 700 Hz 

Fig. 37: Scaled-down L2/3 Attractor Memory network simu- 
lated with the virtual hardware. Attractors have been excited 
by Poisson input with different overall rates. 
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comparison to an ideal simulation environment. By provid- 
ing identical parameter settings, one can hereby gain a good 
perspective for gauging the effects of the hardware-inherent 
fluctuations. 

The results are practically identical to the ones from the 
virtual hardware, perhaps not surprisingly, as it is not sub- 
ject to hardware- specific manufacturing process fluctuations 
(see Figures 37 and 39). 

Also, due to its small size, does the network not pose any 
challenge to the mapping algorithm, making the hardware 
realization a perfect replica of its software counterpart. Still, 
the successful emulation offers a convincing proof of the 
efficacy of our mapping work flow. 

The chip-based neuromorphic device, on the other hand, 
is subject to the full range of hardware- specific distortions. 
Nevertheless, the resulting network dynamics agree well 
with the NEST results, requiring only small adjustments 
in the input activity. These results are expected to greatly 
improve on the wafer-scale hardware, thanks to the supe- 
rior architecture of the HICANN units. Also, a much more 
complex neuron model and the availability of both short- 
and long-term synaptic plasticity mechanisms will make the 
wafer- scale hardware much more capable of emulating bio- 
logically accurate network models. 

As a conclusion, we note that the software results are in 
very good agreement with the ones generated by our hard- 
ware back-ends, thus supporting our work flow concept and 
solidifying the position of our neuromorphic hardware as a 
universal modeling tool. The particularly appealing feature, 
especially from a neural modeling perspective, is the seam- 
less transition from software simulation to hardware emula- 
tion, which, from the perspective of the PyNN user, is ac- 
complished by modifying a single line of code. 
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Fig. 38: 3-D view of the scaled-down L2/3 Attractor Mem- 
ory network model (see Figure 35) and its mapping to the 
wafer generated by the GraViTo software (Section 2.2.9). 
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(b) Poisson input with overall rate of 700 Hz 
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(c) Poisson input with overall rate of 700, 800 and 700 Hz 

Fig. 39: Scaled-down L2/3 Attractor Memory network sim- 
ulated with the NEST software. Attractors have been excited 
by Poisson input with different overall rates. 
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Fig. 40: A STDP evaluation network layout with variable correlation strength c. Gray circles represent neurons, arrows 
synapses. The spike trains of neurons contain random (black) and correlated (red) spikes. B Mean (thick line) and standard 
deviation (shaded area) of reference STDP synapses with correlated (green) and uncorrected (blue) pre-synaptic neurons. 
Thin lines are single example traces. C Like B, but with hardware inspired STDP synapses. D Weight distributions after 
1000 s for reference (p < 0.001) and hardware inspired (p < 0.05) STDP synapses. 



3.2.4 Analysis and Development of STDP in Hardware 

The effects of discrete synaptic weights on networks is ana- 
lyzed by means of a simple network (Figure 40 A). Ten pre- 
synaptic neurons are connected to one post-synaptic neuron 
using both the reference and the hardware inspired STDP 
synapses as described in Section 2.5.3. In order to analyze 
effects of discrete weights isolated from other hardware spe- 
cific constraints the weight update frequency is set equal to 
the time resolution /z = 0. 1 ms of the software simulator. The 
spike rates of the pre-synaptic neurons are adapted in such 
way that the post-synaptic neuron is firing at about 10 Hz. 
In case of correlated pre-synaptic neurons their correlation 
coefficient is c = 0.05 (Kuhn et al, 2003). Varying the in- 
put spike rates or the correlation coefficient does not change 
the conceptual outcome. As the currently implemented hard- 
ware synapses have a weight resolution of 4 bits, this reso- 
lution is used to test the performance of the hardware. 

Figure 40 shows the mean weight traces for runs with 
correlated pre-synaptic neurons as well as for separate runs 
with uncorrected pre-synaptic neurons. In case of hardware 
inspired STDP synapses (Figure 40C) the standard devia- 
tions of the mean weight traces are much larger than those 
of the reference STDP synapses. These increased deviations 
are due to the large weight steps between adjacent discrete 
weights. Applying a t-test to the synaptic weight distribu- 
tion after 1000 s shows that the hardware inspired STDP 
synapses can nevertheless distinguish between uncorrected 
and correlated input. For hardware inspired STDP synapses 
the probability that the synaptic weights of both populations 
are separated is = 0.02, compared to = 2 • 10~^ for the 
reference STDP synapses. This ability of distinction deter- 
mines the ability of detecting synchronous input, which is 
fundamental for most STDP applications. For correlation 



coefficients as low as c = 0.05 a resolution of 4 bits is still 
sufficient to detect synchronous pre-synaptic firing. 

3.3 Software Performance 

The usability of any hardware modeling platform strongly 
depends on the time needed for configuration and repro- 
gramming, thus the benchmarks introduced in Section 2.4 
also serve as tests for scalability in terms of time and space. 

Figure 41 shows that the space consumption for the Bio- 
Model data grows almost linearly depending on the number 
of neurons and the synaptic density psyn - Thus, for the given 
benchmarks, the model sizes for networks with a neuron 
count A/5/0 < 10^ and an approximate average psyn < 10% 
stay within an acceptable limit of 10 GByte. Furthermore, 
the placement algorithms, in spite of the cubical problem, 
grow below 0{n^) and as such fulfill the requirement of a 
reasonable runtime for complex mapping problems (Ehrlich 
et al, 2010). 
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Fig. 41 : Mapping process scaling in terms of BioModel sizes 
(Ehrlich et al, 2010). 
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3.4 HICANN Prototype Calibration 



4 Discussion 



In order to provide an example of its functionality, the cali- 
bration framework described in Section 2.2.7 has been used 
to reproduce a biologically relevant tonic spiking neuron on 
the HICANN prototype. First, a reference simulation of a 
tonic spiking neuron using the AdEx model was created. 
For this simulation, the adaptation and exponential terms 
were disabled. The simulated neuron showed a firing rate 
of 53.4 Hz, which due to the speedup factor of the HICANN 
system corresponds to 534kHz in the hardware domain. 

The calibration was performed on a hardware neuron, 
and the calibration data was stored in the database. Then, the 
biological parameters from the reference simulation were 
sent to the database, which provided the necessary hardware 
parameters in return. The floating gates of the corresponding 
neuron on the HICANN prototype were then programmed 
with these values. The results are shown in Figure 42. After 
calibration, the hardware neuron showed a firing rate of 536 
kHz, which is very close to the reference simulation. 
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(a) Reference simulation 
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(b) Hardware neuron recording 

Fig. 42: Comparison between the reference simulation (a) 
and the membrane recording of the hardware neuron after 
calibration (b). 



Within the FACETS research project, a novel type of neu- 
romorphic hardware system has been developed. The de- 
vice combines massive acceleration, large network sizes and 
a high configurability with the possible advantages inher- 
ent to analog designs, such as power efficiency and a time- 
continuous operation. Following this strategy, neuromorphic 
engineering has the potential to step out of its niche and pro- 
vide new and relevant input to neuroscience, e.g. towards the 
understanding of cortical dynamics. Still, as we noticed dur- 
ing the development of our specific system and during first 
experiments with prototypes, the originally available tech- 
niques and tools were clearly insufficient for exploiting the 
potential of such devices in a modeling context. It is our ex- 
perience that the quality of interfaces that make hardware 
flexibility actually usable is as essential as the electronic 
substrate itself. 

The presented work approaches this challenge by intro- 
ducing a methodological framework that establishes a bal- 
ance between the configuration complexity and potential 
of a novel hardware system on the one hand and the us- 
ability and spectrum of possible applications for modelers 
without a hardware background on the other. This neuro- 
morphic modeling workflow has been depicted both in its 
conceptual whole and by means of detailed component de- 
scriptions. It represents one major outcome of the inter- 
disciplinary collaboration among FACETS partners, thereby 
integrating expertise and progress in the fields of physio- 
logically well-founded cortex modeling, hardware engineer- 
ing and community-driven software development. The mul- 
titude of the described components and their structured in- 
teraction reflects the comprehensiveness we are aiming at. 

We showed experimental data that provide a proof of 
mature functionality of the implemented stack of model- 
to-hardware translation tools. The experimental results of 
mapping distortion studies on the basis of our virtual wafer- 
scale hardware system and reference software simulations 
represent examples of ongoing analysis work that continu- 
ously improves our software layer stack, the hardware de- 
sign and our neuromorphic modeling experience. A dedi- 
cated follow-up publication focusing on these analysis ef- 
forts is in preparation. In particular, this work soon to be 
published will focus on computational aspects and address 
many questions that remain open at this point, especially 
concerning the computational and functional limitations that 
are imposed to the network models by the presented con- 
cepts. 

The architectures in the presented benchmark collection 
already now cover a wide spectrum of computationally in- 
teresting aspects and relevant connectivity structures. But 
although the workflow presented in this paper enables us 
to successfully realize these benchmarks with the FACETS 
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hardware devices, further vaHdation of the introduced con- 
cepts will be required. Important features like the synaptic 
plasticity mechanisms and large regions of the technically 
available hardware configuration space have not yet been 
systematically explored with our workflow in network con- 
texts. And, as soon as alternative neuromorphic devices with 
a comparable degree of configurability and size (but pos- 
sibly different solutions and components) will be available 
from other groups, all applicable aspects of the described 
neuromorphic workflow will have to be tested also with 
these platforms. 

In addition to such necessary investigations, we plan to 
extend the set of models that we use to benchmark and tune 
our workflow. So far the realization of a large variety of 
biologically relevant structures has been the primary goal 
of iteratively applying the depicted optimization process. A 
second focus will be put on computationally powerful archi- 
tectures in general, independent of their biological plausibil- 
ity. Building upon this work, the presented methodological 
framework with the neuromorphic hardware system at its 
core will eventually be used to approach open neuroscien- 
tific questions. 
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